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Pressure  Points  in  Reading  Comprehension:  A  Quantile  Multiple 

Regression  Analysis 


Language  and  Reading  Research  Consortium  and  Jessica  Logan 

Ohio  State  University 


The  goal  of  this  study  was  to  examine  how  selected  pressure  points  or  areas  of  vulnerability  are  related 
to  individual  differences  in  reading  comprehension  and  whether  the  importance  of  these  pressure  points 
varies  as  a  function  of  the  level  of  children’s  reading  comprehension.  A  sample  of  245  third-grade 
children  were  given  an  assessment  battery  that  included  multiple  measures  of  vocabulary,  grammar, 
higher-level  language  ability,  word  reading,  working  memory,  and  reading  comprehension.  Ordinary 
least  squares  (OLS)  and  quantile  regression  analyses  were  undertaken.  OLS  regression  analyses  indicated 
that  all  variables  except  working  memory  accounted  for  unique  variance  in  reading  comprehension. 
However,  quantile  regression  showed  that  the  extent  of  the  relationships  varied  in  some  cases  across 
readers  of  different  ability  levels.  Results  suggest  that  quantile  regression  may  be  a  useful  approach  for 
the  study  of  reading  in  both  typical  and  atypical  readers  and  aid  greater  specification  of  componential 
models  of  reading  comprehension  across  the  ability  range. 

Keywords:  reading  comprehension,  language,  quantile  regression 


Reading  comprehension  is  a  complex  activity  informed  by  mul¬ 
tiple  language  and  cognitive  skills,  in  addition  to  word  reading 
ability.  Research  to  date  has  sought  to  determine  the  skills  that 
predict  reading  comprehension  outcomes  in  unselected  samples 
(Oakhill  &  Cain,  2012;  Vellutino,  Tunmer,  Jaccard,  &  Chen,  2007) 
or  to  identify  candidate  causes  of  poor  reading  comprehension 
(Cain  &  Oakhill,  2006;  Catts,  Adlof,  &  Weismer,  2006;  Nation, 
Clarke,  Marshall,  &  Durand,  2004).  What  this  work  does  not  tell 
us  is  which,  if  any,  of  these  skills  make  a  unique  contribution  to 
reading  comprehension  outcomes  and  also  whether  the  skills  that 
predict  reading  comprehension  in  general  are  the  same  for  those 
with  poor,  average,  or  good  reading  comprehension.  We  seek  to 
address  these  significant  gaps  in  knowledge.  Our  aims  are  to 
identify  the  skills  that  uniquely  predict  reading  comprehension 
across  the  ability  range,  and  to  determine  whether  their  importance 
varies  as  a  function  of  the  level  of  children’s  reading  comprehen¬ 


sion.  First,  we  consider  our  theoretical  framework  and  variables, 
and  then  explain  our  analytic  approach. 

Much  of  the  research  on  the  component  skills  that  predict 
reading  comprehension  has  focused  on  children  with  reading  com¬ 
prehension  problems.  This  work  has  demonstrated  many  factors 
that  are  associated  with  reading  comprehension  difficulties,  includ¬ 
ing  word  reading,  language  skills,  and  cognitive  resources  such  as 
working  memory  (Cain  &  Oakhill,  2006;  Catts  et  al.,  2006;  Nation 
et  al.,  2004).  Perfetti,  Stafura,  and  Adlof  (2014)  draw  on  this  body 
of  research  to  propose  three  groups  of  “pressure  points”  or  areas  of 
vulnerability  in  the  reading  system  that  might  account  for  poor 
reading  comprehension.  These  are  (a)  word-level  processes  (word 
decoding,  semantic),  (b)  higher-level  comprehension  process  (e.g., 
inferencing,  comprehension  monitoring),  and  (c)  general  cognitive 
abilities  (e.g.,  working  memory).  We  adopt  Perfetti  et  al.’s  termi¬ 
nology  of  pressure  points  because  of  our  interest  in  the  skills  that 
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might  lead  to  comprehension  breakdown,  but  we  propose  a  slightly 
different  and  expanded  categorization  of  potential  pressure  points 
as  outlined  below.  The  different  child-level  variables  that  we 
consider  meet  the  criteria  for  a  potential  pressure  point  are  each 
integral  to  text  comprehension,  have  face  validity  as  skills  that  may 
be  causally  related  to  text  comprehension  as  well  as  being  robust 
correlates  with  reading  comprehension  skill,  and  are  potentially 
malleable  through  instruction  and  intervention  (Perfetti  &  Adlof, 
2012). 

Word-level  processes  play  an  important  first  step  in  the  reading 
comprehension  process.  The  ability  to  accurately  and  efficiently 
decode  and  recognize  printed  words  is  critical  to  building  an 
understanding  of  the  text  (Perfetti  &  Hart,  2002).  Indeed,  children 
with  word  reading  problems  (i.e.,  dyslexia)  often  have  significant 
deficits  in  reading  comprehension  (Shankweiler  et  al.,  1999).  Be¬ 
yond  decoding,  word-level  processes  include  the  ability  to  access 
and  use  word  meaning.  Reading  comprehension  requires  that  read¬ 
ers  have  rich  lexical  knowledge  that  can  be  retrieved  quickly  and 
used  flexibly  to  derive  appropriate  contextual  meaning.  Children 
with  poor  reading  comprehension  often  have  deficits  in  lexical 
knowledge:  They  have  smaller  vocabularies  (Catts  et  al.,  2006; 
Nation  et  al.,  2004)  and  are  less  sensitive  to  semantic  relationships 
or  multiple  meanings  (Henderson,  Snowling,  &  Clarke,  2013; 
Nation  &  Snowling,  1999)  than  are  typical  readers.  Studies  using 
event-related  potentials  have  documented  neurological  evidence  of 
semantic  processing  deficits  in  poor  comprehenders  (Landi  & 
Perfetti,  2007;  Yang  et  al.,  2005)  and  retrospective  studies  have 
shown  that  poor  comprehenders  have  vocabulary  weaknesses  that 
are  present  in  the  preschool  years  (Catts  et  al.,  2006;  Elwer, 
Keenan,  Olson,  Byrne,  &  Sameulsson,  2013;  Justice,  Mashbum,  & 
Petscher,  2013;  Nation,  Cocksey,  Taylor,  &  Bishop,  2010).  Thus, 
there  is  good  evidence  that  reading  comprehension  difficulties  are 
associated  with  both  poor  word  reading  and  poor  vocabulary,  and 
a  theoretical  basis  that  each  may  contribute  to  reading  comprehen¬ 
sion  outcomes. 

Perfetti  et  al.  (2014)  group  word-level  processes  together  as  a 
possible  pressure  point  and  there  is  empirical  support  for  that 
position.  For  example,  vocabulary  skills  contribute  to  competence 
in  word  reading  (Language  and  Reading  Research  Consortium, 
2015a;  see  also,  Metsala,  1999;  Ouellette  &  Beers,  2010;  Tunmer 
&  Chapman,  2012)  suggesting  an  association  between  the  two. 
However,  there  is  also  an  empirical  basis  to  consider  word  reading 
and  vocabulary  as  separate  word-level  pressure  points.  First,  word 
reading  and  vocabulary  knowledge  make  distinguishable  contri¬ 
butions  to  the  concurrent  prediction  of  reading  comprehension  in 
Grades  1  to  4  (Cain,  Oakhill,  &  Bryant,  2004;  Richter,  Isbemer, 
Naumann,  &  Neeb,  2013).  In  addition,  precursors  of  decoding 
(e.g.,  letter  knowledge  and  phonological  awareness)  and  vocabu¬ 
lary  knowledge  measured  before  Grade  1  make  separable  contri¬ 
butions  to  reading  comprehension  over  time  through  their  respec¬ 
tive  influence  on  word  decoding  and  listening  comprehension 
(Kendeou,  van  den  Broek,  White,  &  Lynch,  2009;  Storch  & 
Whitehurst,  2002).  Second,  when  we  consider  children  who  have 
poor  reading  comprehension  in  the  presence  of  age  appropriate 
word  reading,  not  all  have  weak  vocabulary  skills  (Cain  &  Oakhill, 
1999;  Ehrlich,  Remond,  &  Tardieu,  1999;  Tong,  Deacon,  Kirby, 
Cain,  &  Parrila,  2011).  Such  findings  question  how  best  to  con¬ 
ceptualize  the  interrelations  between  these  different  subcompo¬ 
nents  of  reading.  Thus,  in  our  analyses  we  consider  word  reading 


and  vocabulary  separately,  as  distinguishable  pressure  points,  to 
determine  whether  they  each  make  unique  contributions  to  reading 
comprehension  across  the  ability  range,  or  should  indeed  be 
grouped  together  as  word-level  processes  (e.g.,  Perfetti  et  al., 
2014). 

Another  language  skill  related  to  reading  comprehension  is 
grammar.  This  was  not  considered  as  a  separate  candidate  pressure 
point  by  Perfetti  and  colleagues.  The  understanding  of  individual 
sentences  is  necessary  to  construct  the  mental  model  of  the  text  s 
meaning.  Grammatical  cohesive  devices  serve  a  clear  integrative 
function  enabling  the  meanings  of  successive  clauses  and  sen¬ 
tences  to  be  combined  (Halliday  &  Hasan,  1976).  Grammar  pre¬ 
dicts  early  reading  comprehension  outcomes  (Muter,  Hulme, 
Snowling,  &  Stevenson,  2004)  and  poor  comprehenders  show 
weaknesses  on  measures  of  grammar  and  morphosyntax  (Adlof  & 
Catts,  2015;  Catts  et  al.,  2006;  Marshall  &  Nation,  2003;  Stothard 
&  Hulme,  1992;  Tong,  Deacon,  &  Cain,  2014).  In  addition,  gram¬ 
mar  forms  a  distinct  language  dimension  from  vocabulary  and 
higher-level  language  by  Grade  3  (Language  and  Reading  Re¬ 
search  Consortium,  2015b).  Given  this  backdrop,  we  examined  the 
role  of  grammar  as  an  additional  language  pressure  point  in  the 
reading  comprehension  process. 

Another  category  of  language  pressure  points  considered  by 
Perfetti  et  al.  (2014)  is  higher-level  comprehension  processes,  such 
as  inference  making  and  comprehension  monitoring,  which  enable 
readers  to  combine  word  meanings  to  form  coherent  sentences  and 
to  integrate  these  to  construct  a  coherent  mental  model.  Poor 
comprehenders  matched  to  good  comprehenders  for  word  read¬ 
ing  and  sight  vocabulary  have  weak  inference  making  (Cain  & 
Oakhill,  1999)  and  comprehension  monitoring  (Ehrlich  et  al., 
1999),  making  these  higher-level  language  skills  a  candidate 
source  of  their  comprehension  difficulties,  separate  from  word- 
level  processes.  In  addition,  higher-level  language  forms  a  sepa¬ 
rable  dimension  to  vocabulary  and  grammar  from  around  Grade  1 
(Language  and  Reading  Research  Consortium,  2015b)  and  predicts 
reading  comprehension  in  addition  to  vocabulary  and  grammar 
(Oakhill  &  Cain,  2012).  For  these  theoretical  and  empirical  rea¬ 
sons,  we  consider  inference  and  comprehension  monitoring  to¬ 
gether  as  a  higher-level  language  pressure  point  (as  do  Perfetti  and 
colleagues). 

There  are  contrasting  accounts  of  the  relative  importance  of 
these  different  oral  language  skills  (vocabulary,  grammar,  and 
higher-level  language  skills)  to  listening  and  reading  comprehen¬ 
sion.  Some  consider  the  oral  language  skills  of  vocabulary  and 
grammar  skills  as  primary  predictors  of  reading  and  listening 
comprehension  outcomes  (Hulme  &  Snowling,  2011)  and  higher- 
level  language  skills  as  a  secondary  pressure  point,  resulting  from 
weaknesses  in  basic  skills  further  down  the  language  processing 
chain  (see  also  Perfetti  et  al.,  2014).  However,  empirical  work  that 
shows  separate  prediction  of  reading  and  listening  comprehension 
from  lower-level  skills  (vocabulary  and  grammar)  and  higher-level 
skills  (inference  making)  (Lepola,  Lynch,  Laakkonen,  Silven,  & 
Niemi,  2012;  Oakhill  &  Cain,  2012;  Silva  &  Cain,  2015)  supports 
a  more  nuanced  model.  Specifically,  this  work  suggests  that 
higher-level  language  is  an  independent  predictor  of  passage-level 
comprehension  from  a  young  age  (also  see  Kendeou  et  al.,  2009). 
If  weak  higher-level  language  is  an  independent  source  of  reading 
comprehension  failure,  it  should  predict  variance  in  the  lower 
ability  range  of  reading  comprehension,  even  when  foundational 


PRESSURE  POINTS 


453 


oral  language  skills  (e.g.,  vocabulary  and  grammar)  are  controlled. 
However,  another  possibility  is  that  higher-level  language  skills 
are  more  influential  predictors  of  reading  comprehension  in  older 
and  better  comprehenders  than  in  younger  and  poorer  readers, 
because  they  are  more  critical  to  performance  on  the  challenging 
texts  that  these  readers  encounter  in  everyday  reading,  as  well  as 
in  standardized  assessments  (e.g.,  Adlof,  Perfetti,  &  Catts,  2011). 
If  so,  we  would  find  independent  prediction  by  these  skills  only  at 
the  higher  end  of  the  reading  comprehension  ability  range. 

Another  type  of  pressure  point  considered  by  Perfetti  and  col¬ 
leagues  is  cognitive  resources  such  as  working  memory.  Working 
memory  is  the  mental  workspace  in  which  language  processing 
and  the  construction  of  the  mental  model  takes  place.  Poor  com¬ 
prehenders  have  weak  working  memory  (Cain,  2006;  Carretti, 
Borella,  Comoldi,  &  de  Beni,  2009;  Nation,  Adams,  Bowyer- 
Crane,  &  Snowling,  1999).  Critically,  assessments  on  measures  of 
memory  that  tap  the  executive  component  of  working  memory 
(i.e.,  tasks  that  require  both  processing  and  storage  of  information) 
are  unique  predictors  of  poor  reading  comprehension;  in  contrast, 
memory  tasks  that  tap  only  phonological  storage  are  specifically 
related  to  decoding  problems  (Swanson  &  Beminger,  1995).  Such 
working  memory  weaknesses  could  affect  the  accurate  storage  of 
the  information  needed  to  make  long  distance  inferences  within  a 
text  and  the  integration  of  new  information  with  the  mental  model, 
leading  to  reading  comprehension  failure.  In  support  of  this  view, 
working  memory  is  related  to  differences  between  good  and  poor 
comprehenders  in  inference  making  ability  (Cain,  Oakhill,  &  Lem¬ 
mon,  2004)  and  comprehension  monitoring  (Oakhill,  Hartt,  & 
Samols,  2005).  Critical  to  our  research  aims,  it  is  necessary  to 
determine  whether  the  prediction  of  reading  comprehension  by 
higher-level  language  skills  is  independent,  or  due  to  their  depen¬ 
dence  on  working  memory. 

We  also  consider  whether  or  not  working  memory  is  itself  a 
primary  or  secondary  pressure  point.  In  support  of  the  first  posi¬ 
tion,  weak  working  memory  is  evident  in  poor  comprehenders  in 
the  presence  of  intact  lexical  processes  (age  appropriate  word 
reading  and  vocabulary  knowledge;  Cain,  2006;  Yuill,  Oakhill,  & 
Parkin,  1989).  Further,  working  memory  makes  a  unique  contri¬ 
bution  to  the  prediction  of  reading  comprehension  in  young  read¬ 
ers  (Cain,  Oakhill,  &  Bryant,  2004).  Other  work  supports  the 
alternative  position  that  working  memory  is  a  secondary  pressure 
point,  with  weaknesses  in  working  memory  arising  from  word- 
level  difficulties  (Nation  et  al.,  1999;  Perfetti,  1985):  That  is,  slow 
or  inefficient  lexical  processes  might  limit  the  available  resources 
in  working  memory  for  the  higher-level  integrative  skills  needed  to 
construct  the  mental  model. 

Our  review  demonstrates  an  inconclusive  picture  of  the  candi¬ 
date  causes  of  reading  comprehension  failure:  There  is  evidence 
that  each  of  the  proposed  types  of  pressure  point  is  both  a  primary 
and  a  secondary  source  of  reading  comprehension  difficulties.  One 
difficulty  in  interpreting  these  previous  studies,  is  that  the  majority 
of  studies  have  investigated  each  pressure  points  individually  in 
relationship  to  reading  comprehension.  A  handful  of  studies  have 
examined  their  unique  and  shared  contributions  to  reading  com¬ 
prehension  (Cain,  Oakhill,  &  Bryant,  2004;  Oakhill  &  Cain,  2012; 
Catts  et  al.,  1999;  Vellutino  et  al.,  2007),  but  these  are  limited 
because  they  have  relied  on  a  single  measure  of  each  construct, 
resulting  in  a  narrow  sampling  of  each  construct  that  is  also  prone 
to  measurement  error.  In  addition,  this  work,  like  most  predictive 


research  in  reading,  has  operated  with  the  underlying  assumption 
that  variables  are  equally  predictive  for  all  participants.  Confi¬ 
dence  intervals  around  such  estimates  give  an  idea  about  how 
similar  the  effect  is  for  participants,  but  the  fact  remains  that  the 
interpreted  estimates  are  “averaged”  across  children  with  different 
levels  of  reading  comprehension  ability.  This  approach  does  not 
allow  us  to  determine  whether  the  factors  that  are  predictive  of 
poor  reading  comprehension  are  the  same  as  those  for  average  or 
good  reading  comprehension. 

Research  on  the  component  skills  of  reading  comprehension  has 
not  tested  this  issue  directly  but  indicates  that  it  warrants  our 
attention:  Each  of  the  factors  reviewed  above  explains  unique  as 
well  as  shared  variance  in  reading  comprehension,  but  that  the 
strength  of  the  contribution  can  differ  by  age  (Cain,  Oakhill,  & 
Bryant,  2004;  Vellutino  et  al.,  2007).  Taking  age  as  a  proxy  for 
ability  level,  these  findings  suggest  that  a  given  variable  may  not 
be  equally  predictive  across  the  ability  range,  a  pattern  that  has 
been  found  for  other  reading-related  measures,  such  as  naming 
speed  (Johnston  &  Kirby,  2006).  The  literature  on  English  lan¬ 
guage  learners  (ELL),  although  not  the  focus  of  this  study,  also 
points  to  the  need  to  examine  the  prediction  of  reading  compre¬ 
hension  across  different  ability  groups:  ELL  and  monolingual 
groups  differ  not  only  in  reading  comprehension  level,  but  in  the 
language  skills  that  significantly  predict  reading  comprehension  in 
each  group  (Geva  &  Famia,  2012). 

In  summary,  reading  comprehension  is  a  complex  construct 
informed  by  a  range  of  lower-  and  higher-level  language  skills, 
which  draw  on  cognitive  resources.  Theoretically,  each  of  these 
language  and  cognitive  skills  may  make  an  independent  contribu¬ 
tion  to  the  prediction  of  reading  comprehension  and  there  is  broad 
empirical  support  for  this.  On  examination,  the  empirical  work 
indicates  that  the  relationship  between  these  different  factors  and 
reading  comprehension  may  be  specific  to  reader  profile,  but 
research  studies  to  date  have  not  directly  addressed  this  issue.  One 
approach  that  can  address  this  gap  in  our  knowledge  is  quantile 
regression. 

Quantile  regression  uses  a  weighting  procedure  to  estimate  the 
relationship  between  a  predictor  variable  and  an  outcome  variable 
at  several  specified  points  in  the  distribution  of  the  outcome 
variable.  As  such,  it  allows  for  the  comparison  of  the  factors 
related  to  poor  versus  good  comprehension  while  at  the  same  time 
using  data  across  the  entire  ability  range.  This  technique  has 
usefully  demonstrated  that  the  contributions  of  heritability  and 
shared  environmental  influences  change  across  the  reading  ability 
range  (Logan  et  al.,  2012),  that  different  approaches  to  estimating 
oral  reading  fluency  can  have  different  levels  of  predictability  for 
good  versus  poor  readers  (Petscher  &  Kim,  2011),  and  that  floor 
effects  can  lower  the  predictability  of  screening  instruments  (Catts, 
Petscher,  Schatschneider,  Sittner  Bridges,  &  Mendoza,  2009). 
These  studies  demonstrate  the  sensitivity  of  this  approach  for 
uncovering  nonlinear  relationships  that  may  be  missed  by  other 
statistical  approaches. 

Our  aims  for  the  present  study  were  to  determine  which  lan¬ 
guage  and  cognitive  factors  are  related  to  reading  comprehension 
in  third-grade  children  and  to  investigate  if  these  factors  are  the 
same  or  different  at  various  levels  of  comprehension.  To  do  so, 
participants  completed  multiple  measures  of  word  recognition, 
vocabulary,  grammar,  higher-level  language,  working  memory, 
and  reading  comprehension.  This  provided  a  broader  sampling  of 
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these  constructs  than  in  previous  research,  reducing  measurement 
error,  and  enabling  greater  generalization  of  our  findings.  Criti¬ 
cally,  quantile  regression  analyses  were  conducted.  On  a  theoret¬ 
ical  level,  quantile  regression  can  elucidate  the  relations  between 
different  language  and  cognitive  skills  and  the  nature  of  their 
influence  on  reading  comprehension  success  and  failure  (i.e.,  con¬ 
sistent  or  not  across  the  ability  range).  On  a  practical  level, 
documenting  the  skills  that  influence  reading  comprehension  for 
good  and  poor  readers  may  also  assist  practitioners  in  developing 
approaches  for  early  identification  and  intervention  of  comprehen¬ 
sion  problems.  For  example,  those  skills  most  closely  related  to 
poor  comprehension  may  be  targets  for  assessment  and/or  inter¬ 
vention  protocols. 

Method 

Participants 

The  participants  were  part  of  a  larger  comprehensive  longitudi¬ 
nal  investigation  of  reading  comprehension  in  preschool  to  third- 
grade  children.  Children  were  selected  from  four  sites  in  different 
regions  of  the  United  States  with  school  districts  selected  for  size 
and  diversity  of  the  student  populations,  as  well  as  willingness  to 
participate.  Teachers  received  recruitment  packets  to  send  home 
for  all  students  in  their  class.  Among  those  children  whose  parents 
consented  to  participation,  we  randomly  selected  approximately 
equal  number  of  children  per  site  per  grade  to  receive  our  assess¬ 
ment  battery.  The  sample  for  the  current  study  included  245 
children  (mean  age  =  8.58  years)  who  had  completed  the  third- 
grade  assessment  battery.  Table  1  shows  the  mean  age,  income 
status,  gender,  ethnicity,  percentage  of  free/reduced  lunch,  and 
special  education  status  of  participants.  Note  that  our  sample 
had  a  disproportionate  percentage  of  children  with  family  in¬ 
come  in  the  higher  bracket.  This,  no  doubt,  had  some  impact  on 
the  results  reported  below.  However,  because  we  examined 

Table  1 


Participant  Characteristics 


Characteristic 

Percentage  of  sample 

Individualized  education  plan 

6% 

Income  (categorical) 

<30K 

11% 

31K-60K 

17% 

60K-85K 

20% 

>85K 

41% 

Did  not  report 

11% 

Free/reduced  price  lunch 

Yes 

18% 

Did  not  report 

10% 

Gender 

Female 

52% 

Race  (participants  could  select  multiple) 

White/Caucasian 

81% 

African  American 

5% 

Asian 

5% 

Did  not  report 

10% 

Ethnicity 

Hispanic/Latino 

6% 

Not  Hispanic/Latino 

83% 

Did  not  report 

10% 

children’s  performances  across  the  reading  comprehension  dis¬ 
tribution,  our  sampling  procedure  likely  influences  the  inter¬ 
pretation  of  our  results  less  than  if  only  mean  performances 
were  considered. 

Measures 

Our  assessment  battery  included  multiple  measures  of  vocabu¬ 
lary,  grammar,  higher-level  language  processing,  working  mem¬ 
ory,  word  recognition,  and  reading  comprehension.  All  standard¬ 
ized  measures  had  adequate  psychometrics  as  reported  in  cited 
manuals  or  research  reports.  Cronbach’s  alphas  were  also  calcu¬ 
lated  for  both  standardized  and  nonstandardized  measures  and  are 
presented  in  Table  2. 

Vocabulary.  Three  measures  of  vocabulary  were  adminis¬ 
tered.  The  Peabody  Picture  Vocabulary-4  (Dunn  &  Dunn,  2007) 
assessed  children’s  recognition  of  the  meaning  of  spoken  words. 
The  Expressive  Vocabulary  Test-2  (Williams,  2007)  assessed  ex- 
pessive  vocabulary.  Participants  completed  the  Word  Classes  2 
(Expressive  &  Receptive)  subtest  from  the  Clinical  Evaluation  of 
Language  Fundamentals-4  (CELF-4;  Semel,  Wiig,  &  Secord, 
2003),  which  assessed  their  ability  to  understand  relationships 
between  words  that  are  related  by  semantic  class  features  and  to 
orally  express  the  similarities  and  differences  concerning  those 
relationships  (e.g.,  cat,  whiskers,  nest,  which  of  these  go  together; 
why). 

Grammar.  Four  measures  of  grammar  were  administered. 
The  Word  Structure  subtest  of  the  CELF-4  (Semel  et  al.,  2003) 
assessed  children’s  abilities  to  apply  word  structure  rules  or  select 
appropriate  pronouns  (e.g.,  The  boy  likes  to  read.  Everyday  he 

_ ).  The  Recalling  Sentences  subtest  of  the  CELF-4  assessed 

children’s  ability  to  listen  to  spoken  sentences  of  increasing  length 
and  complexity  and  repeat  them  without  changing  meaning  or 
sentence  structure  (e.g.,  The  girl  stopped  to  buy  some  milk,  even 
though  she  was  late  for  class).  The  Test  for  Reception  of  Gram¬ 
mar — Version  2  (Bishop,  2003)  assessed  understanding  of  gram¬ 
matical  stmctures.  In  this  task,  children  were  asked  to  point  to  the 
picture  that  corresponded  to  a  spoken  sentence  (e.g.,  The  man  the 
elephant  sees  is  eating).  A  morphological  derivation  task  described 
by  Wagner  and  colleagues  (Wagner,  n.d.)  assessed  knowledge  of 
derivational  morphology.  The  assessor  presented  children  with  a 
base  word  (e.g.,  farm)  and  an  incomplete  sentence  for  which 
children  provided  a  derived  form  of  the  base  (e.g.,  My  uncle  is  a 
_ )• 

Higher-level  language.  Three  measures  of  higher-level  lan¬ 
guage  were  administered.  A  researcher-developed  measure 
based  on  the  work  of  Cain  and  Oakhill  (Cain  &  Oakhill,  2006; 
Oakhill  &  Cain,  2012)  was  used  to  assess  comprehension  mon¬ 
itoring.  The  comprehension  monitoring  task  included  five  prac¬ 
tice  stories  and  12  test  stories  that  were  either  entirely  consis¬ 
tent  or  included  inconsistent  information.  Children  listened  to 
each  and  were  asked  whether  it  madd  sense  and,  if  not,  what 
was  wrong  with  the  story.  Children  received  a  point  for  each 
inconsistent  story  for  which  they  correctly  identified  the  incor¬ 
rect  information.  A  second  researcher-developed  measure  based 
on  work  by  Oakhill  and  Cain  (2012)  and  Stein  and  Glenn  (1982) 
assessed  children’s  text  structure  knowledge,  specific  to  order¬ 
ing  narrative  events  into  a  causally  and  temporally  coherent 
sequence.  In  this  story  arrangement  task,  children  were  told  that 
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Table  2 

Factor  and  Descriptive  Information  for  Variables  in  the  Six  Factor  Analyses 


Factor  information  Descriptive  information 


Measures 

Factor 

loading 

Correleated 

errors 

Cronbach’s 

alpha 

Min 

Max 

M 

SD 

Skewness 

Kurtosis 

Reading  comprehension 

Gates-MacGinitie 

.93 

.91 

8 

48 

33.15 

9.12 

-.64 

-.30 

Reading  comprehension 

.74 

.80 

3 

27 

19.53 

4.80 

-.85 

.46 

Passage  comprehension 

.77 

.89 

6 

56 

36.94 

6.30 

-.68 

3.46 

Vocabulary 

PPVT-4 

.78 

.95 

99 

196 

151.05 

16.96 

-.12 

.10 

EVT-2 

.78 

.95 

59 

151 

114.58 

14.22 

-.13 

.40 

Word  classes  receptive 

.88 

.80 

1 

20 

11.25 

3.20 

-.14 

.32 

Word  classes  expressive 
(PPVT-4  with  EVT-2) 

.83 

.36 

.75 

0 

14 

6.56 

2.64 

.15 

-.27 

Grammar 

Word  structure 

.67 

.63 

18 

32 

27.91 

2.87 

-.89 

.36 

Recalling  sentences 

.74 

.92 

24 

95 

65.28 

13.99 

-.18 

-.32 

TROG 

.76 

.78 

4 

20 

15.93 

2.81 

-1.09 

1.32 

Morphological  derivation 

.79 

.80 

3 

26 

15.62 

4.54 

-.30 

.00 

Higher-level  language 

Comp  monitoring 

.70 

.75 

0 

8 

5.90 

1.77 

-1.16 

1.08 

SAT 

.52 

.67 

0 

4 

1.70 

1.46 

.28 

-1.27 

Inference-integration 

.37 

.45 

0 

2 

1.40 

.43 

-.64 

-.09 

Inference-background 
(Integration  w/background) 
Word  reading 

.49 

.46 

.47 

0 

2 

1.64 

.33 

-1.39 

2.94 

Word  identification 

.92 

.96 

27 

92 

67.88 

9.06 

-.25 

1.19 

Word  attack 

.86 

.93 

4 

44 

30.14 

7.83 

-.64 

.05 

TOWRE-SWD 

.69 

— 

21 

90 

64.29 

9.61 

-.80 

2.38 

TOWRE-PDE 
(SWE  with  PDE) 

.85 

.38 

— 

1 

60 

31.64 

11.13 

-.15 

-.34 

Memory 

Numbers  reversed 

.61 

.70 

2 

20 

11.37 

2.82 

.19 

.62 

Auditory  memory 

.73 

.82 

0 

31 

19.72 

5.39 

-.57 

.31 

Recalling  sentences 

.74 

.92 

24 

95 

65.28 

13.99 

-.18 

-.32 

Memory  updating  task 

.45 

.84 

2 

26 

12.56 

4.61 

.28 

.12 

Note.  PPVT-4  =  Peabody  Picture  Vocabulary  Test-4;  EVT-2  =  Expressive  Vocabulary  Test-2;  TROG  =  Test  for  Reception  of  Grammar;  SAT  =  Story 
Arrangement  Task;  TOWRE-SWE  =  Test  of  Word  Reading  Efficiency-Sight  Word  Reading  Efficiency;  TOWRE-PDE  =  Test  of  Word  Reading 
Efficiency-Phonemic  Decoding  Efficiency. 


they  would  read  some  sentences  that  tell  a  story,  but  the  story 
is  out  of  order.  The  assessor  then  showed  the  children  a  set  of 
six  to  12  cards,  with  one  sentence  typed  on  each  card,  in  a  fixed 
order  and  read  each  sentence  aloud  to  the  child.  The  child  was 
asked  to  rearrange  the  sentences  to  put  them  in  the  correct 
sequence.  This  measure  consists  of  one  practice  story  and  four 
test  stories.  A  third  researcher-developed  measure,  inferencing 
task,  based  on  work  by  Cain  and  Oakhill  (1999)  and  Oakhill  and 
Cain  (2012)  was  used  to  assess  children’s  ability  to  generate 
two  types  of  inferences  from  short  narratives:  inferences  that 
require  integration  of  two  premises,  and  inferences  that  require 
integration  of  information  in  the  text  with  background  knowl¬ 
edge  to  fill  in  missing  details.  Following  administration  of  a 
practice  story,  children  listened  to  two  stories,  after  which  the 
assessor  asked  eight  questions,  reflecting  four  questions  per 
inference  type.  For  the  integration  type,  the  children  were  asked 
a  question  such  as  “Why  did  they  have  no  money  for  the  bus 
(they  had  spent  it  on  other  things)?”  For  the  background  knowl¬ 
edge  type,  children  were  asked  a  question  such  as  “Why  did 
they  get  wet  on  the  way  home  (story  mentioned  that  it  had 
rained)?”  As  seen  in  Table  2,  Cronbach’s  alpha  for  the  com¬ 


prehension  monitoring  task  was  adequate  (.75),  whereas  the 
alphas  for  the  other  tasks  fell  short  of  commonly  accepted 
cutoffs  (.45-.67).  However,  the  influence  of  the  low  reliability 
of  the  latter  measures  were  minimized  by  the  use  of  a  latent 
variable  for  higher-level  language. 

Working  memory.  Three  measures  of  working  memory  were 
administered.  They  included  two  subtests  from  the  Woodcock 
Johnson  III  Normative  Update  Tests  of  Cognitive  Abilities  (Wood¬ 
cock,  McGrew,  &  Mather,  2001).  The  Numbers  Reversed  subtest 
measures  short-term  memory.  Children  listen  to  a  series  of  num¬ 
bers  which  they  repeat  back  in  a  reversed  order.  The  Auditory 
Working  Memory  subtest  measures  working  memory  or  divided 
attention.  Children  listen  to  a  series  of  both  digits  and  objects  and 
are  then  asked  to  reorder  the  series,  saying  the  objects,  followed  by 
the  digits,  in  sequential  order.  A  researcher-developed  measure, 
the  Memory  Updating  Task,  based  on  the  work  of  Belacchi, 
Carretti,  and  Comoldi  (2010)  assessed  the  ability  to  modify  the 
contents  of  working  memory  using  comparison  of  objects;  for 
example,  as  part  of  the  assessment  for  one  item  the  assessor  would 
say,  “This  time  you  will  hear  five  words.  I  want  you  to  tell  me  the 
names  of  the  two  smallest  things.” 
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Word  reading.  Four  measures  of  word  reading  were  admin¬ 
istered.  They  included  two  subtests  from  the  Woodcock  Reading 
Mastery  Tests-Revised:  Normative  Update  (Woodcock,  1998). 
The  Word  Identification  subtest  measured  children’s  ability  to 
accurately  pronounce  printed  English  words  ranging  from  high  to 
low  frequency  of  occurrence.  The  Word  Attack  subtest  assessed 
children’s  ability  to  read  pronounceable  nonwords  varying  in 
complexity.  We  also  administered  two  subtests  of  the  Test  of  Word 
Reading  Efficiency-Second  Edition  (Torgesen,  Wagner,  &  Ra- 
shotte,  2011).  The  Sight  Word  Efficiency  subtest  measured  how 
many  printed  English  words,  which  ranged  from  high  to  low 
frequency  of  occurrence,  children  could  accurately  pronounce  in 
45  s.  The  Phonemic  Decoding  Efficiency  subtest  assessed  how 
many  pronounceable  nonwords,  which  varied  in  complexity,  chil¬ 
dren  could  accurately  pronounce  in  45  s. 

Reading  comprehension.  Three  measures  of  reading  compre¬ 
hension  were  administered.  The  Gates-MacGinitie  Reading  Tests 
(MacGinitie,  MacGinitie,  Maria,  &  Dryer,  2002)  assessed  chil¬ 
dren’s  ability  to  read  one  or  more  sentences  and  select  from  four 
corresponding  pictures  the  one  that  matched  the  meaning  of  the 
sentences.  The  Reading  Comprehension  Measure  was  an  experi¬ 
mental  measure  adapted  from  the  Qualitative  Reading  Inventory-5 
(QRI-5;  Leslie  &  Caldwell,  2011).  Children  read  two  narrative  and 
two  expository  passages  silently  and  notified  the  examiner  when 
each  passage  had  been  read.  The  examiner  asked  sets  of  open- 
ended  inferential  and  noninferential  questions  after  each  one.  The 
narrative  passages  came  from  the  QRI-5  and  the  expository  pas¬ 
sages  were  created  specifically  for  this  project  and  matched  the 
grade  appropriate  passages  from  the  QRI-5  in  terms  of  approxi¬ 
mate  length  and  Lexile  score.  Children’s  responses  to  administered 
questions  were  audio-recorded  and  were  postscored  based  on  a 
rubric  of  acceptable  answers.  Interrater  reliability  was  acceptable 
with  an  ICC  of  0.86.  Finally,  the  Passage  Comprehension  subtest 
of  the  Woodcock  Reading  Mastery  Tests-Revised:  Normative  Up¬ 
date  (Woodcock,  1998)  was  administered.  This  measure  was  a 
cloze  task  that  required  children  to  read  a  series  of  sentences  or 
short  passages  and  add  the  missing  word(s). 

Procedures 

Assessors  underwent  comprehensive  measurement  training  and 
in-lab  observations  to  ensure  consistent  training,  measurement 
administration,  and  fidelity  across  sites.  At  two  testing  sites,  mea¬ 
sures  were  administered  during  1-h  testing  blocks  in  children’s 
schools.  In  the  other  two  sites,  assessments  were  administered 
during  3-6  h  blocks  at  weekends  and  frequent  play  breaks  were 
taken  to  assure  children  were  attentive  during  test  administration. 
With  the  exception  of  the  Gates-MacGinitie,  which  is  a  standard¬ 
ized  group-administered  test,  all  measures  were  administered  in¬ 
dividually. 

Analyses 

Our  goal  was  to  examine  how  selected  component  skills  (pres¬ 
sure  points)  are  related  to  individual  differences  in  reading  com¬ 
prehension  and  whether  the  same  predictors  are  important  for  all 
levels  of  children’s  reading  comprehension.  In  preliminary  analy¬ 
ses,  we  developed  a  latent  representation  of  each  construct.  Next, 
we  examined  the  relationships  of  each  construct  to  reading  com¬ 


prehension,  as  well  as  the  unique  contributions  of  each  construct  to 
reading  comprehension  using  an  ordinary  least  squares  (OLS) 
framework,  and  then  replicated  the  same  analyses  in  a  quantile 
regression  framework. 

Preliminary  analyses.  As  noted  in  previous  sections,  each  of 
the  six  theoretical  constructs  of  interest  were  tapped  by  several 
unique  measures  (see  Measures  section  for  detailed  information 
about  each  measure).  To  derive  one  representation  for  each  con¬ 
struct,  we  calculated  latent  factor  scores.  The  use  of  latent  factors 
offers  several  advantages  over  using  either  individual  observed 
(manifest)  variables  or  a  composite  score  (averages  across  multiple 
observed  variables)  approach.  In  the  case  of  the  former  advantage, 
latent  representations  are  relatively  free  of  measurement  error,  thus 
yielding  more  accurate  representations  of  the  underlying  relations 
between  measured  constructs.  For  the  latter,  latent  representations 
have  several  advantages.  First,  individual  measures  are  not  forced 
to  equally  contribute  to  the  development  of  the  factor.  Second,  we 
can  further  reduce  error  by  allowing  observed  variables  that  share 
method  variance  (or  are  subtests  of  the  same  larger  measure)  to 
have  correlated  error  variances  as  necessary.  Third,  unlike  com¬ 
posite  scores,  latent  approaches  provide  methods  to  explicitly 
measure  how  well  the  model  fits  the  data.  In  the  present  study,  six 
individual  factor  analyses  were  conducted  to  extract  latent  variable 
representations  for  vocabulary,  grammar,  higher-level  language, 
word  reading,  memory,  and  reading  comprehension.1  The  factors 
were  calculated  and  extracted  in  Mplus  v6.0  using  the  regression 
method  and  maximum  likelihood  estimation.  All  error  variances 
between  observed  scores  were  first  constrained  to  be  independent, 
and  then  relaxed  and  allowed  to  estimate  as  suggested  through 
modification  indices. 

The  fit  of  each  model  was  assessed  by  examining  the  factor 
loadings,  factor  reliabilities,  factor  determinacies,  and  static  fit 
indices  (comparative  fit  index  [CFI],  Tucker-Lewis  index  [TLI], 
root  mean  square  error  of  approximation  [RMSEA],  and  standard¬ 
ized  root  mean  residual  [SRMR]),  with  results  presented  across 
two  separate  tables.  Table  2  provides  the  standardized  factor 
loadings  of  each  measure  onto  its  respective  construct,  as  well  as 
the  standardized  paths  for  any  included  correlated  errors.  Factor 
loadings  indicated  that  all  observed  measures  loaded  sufficiently 
well  on  their  respective  construct  (>0.4;  Kline,  2013).  Table  3 
provides  additional  model  fit  indices.  Construct  reliabilities  were 
calculated  using  Hancock  and  Mueller’s  Coefficient  H  (Hancock 
&  Mueller,  2011),  which  describes  the  relation  between  the  latent 
construct  and  its  measured  indicators,  drawing  information  from 
all  indicators  in  a  manner  commensurate  with  their  ability  to 
reflect  the  construct  (values  at  or  above  .90  indicate  a  reliable 
construct).  Factor  determinacy  values  range  from  0  to  1  and 
indicate  how  well  the  factor  score  correlates  with  the  factor  (a 
larger  value  denotes  a  better  fitting  model).  Static  model  fit  indices 
included  the  CFI  and  TLI  (values  above  0.90  indicate  good  model 
fit)  and  the  RMSEA  and  SRMR  (values  l^ss  than  .05  indicate  good 
model  fit;  Kline,  2013).  Examining  Table  3,  there  were  a  few 
instances  where  an  individual  factor  did  not  meet  all  model  fit 


1  Note  that  our  previous  theoretical  work  with  this  sample  identified  that 
vocabulary,  grammar,  and  higher-level  language  were  unique  but  corre¬ 
lated  aspects  of  language,  thus  these  were  estimated  following  the  same 
method  in  this  examination  (Language  and  Reading  Research  Consortium 
2015b). 
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Table  3 

Correlations  Between  Extracted  Factor  Scores,  Coefficient  H,  Factor  Determinacies,  Model  Fit 
for  Each  Factor  Analysis,  and  Variance  Components  of  Extracted  Factors 


Reading 

comprehension 

Vocabulary 

Grammar 

High-level 

language 

Word 

reading 

Memory 

Factor  correlations 

Vocabulary 

.730 

Grammar 

.772 

.757 

High-level  language 

.628 

.513 

.671 

Word  reading 

.597 

.614 

.585 

.344 

Memory 

.658 

.639 

.785 

.480 

.516 

_ 

Model  Fit 

Coefficient  H 

.96 

.98 

.95 

.87 

.97 

.96 

Factor  determinacy 

.95 

.94 

.91 

.79 

.96 

.87 

Skewness 

-.66 

-.13 

-.67 

-.88 

-.49 

-.16 

CFI 

1.00 

1.00 

1.00 

1.00 

.94 

1.00 

TLI 

1.00 

.98 

.99 

1.03 

.81 

1.00 

RMSEA 

.00 

.08 

.05 

.00 

.29 

.00 

SRMR 

.00 

.01 

.01 

.01 

.02 

.02 

Variance  components 

Tau 

.10 

.15 

.13 

.06 

.10 

.12 

Sigma  squared 

.81 

.74 

.71 

.57 

.82 

.64 

ICC 

.11 

.17 

.15 

.09 

.11 

.16 

Note.  All  correlations  were  significantly  different  from  zero  p  <  .05.  CFI  =  comparative  fit  index;  TLI  = 
Tucker-Lewis  index;  RMSEA  =  root  mean  square  error  of  approximation;  SRMR  =  standardized  root  mean 
square  residual;  Tau  =  variance  component  attributable  to  classrooms;  Sigma  squared  =  error  variance;  ICC  = 
intraclass  correlation  or  the  percent  of  variance  in  the  factor  that  is  attributable  to  classrooms. 


criteria  (e.g.,  Word  Reading  shows  an  RMSEA  considerably  larger 
than  .05).  However,  contemporary  practices  suggest  that  model  fit 
indices  should  be  considered  as  a  collective  rather  than  relying  on 
one  solitary  index  (Lomax,  2013),  thus  taken  together  these  results 
indicate  that  all  six  models  fit  the  data  well. 

Prior  to  entry  in  inferential  statistical  analyses,  we  examined  the 
missing  data  in  the  extracted  factors.  No  variables  showed  more 
than  1%  of  data  missing,  and  Little’s  MCAR  test  indicated  that  the 
data  were  missing  at  random  (x2  =  1.91,  df  =  5,  p  =  .862).  Thus, 
both  the  OLS  and  quantile  regression  reported  in  subsequent 
sections  are  unbiased  by  the  missingness  and  missing  data  were 
addressed  by  listwise  deletion.  Analyses  were  conducted  in  R  (R 
Core  Team,  2012):  The  lm  package  was  used  for  OLS  regression 
and  quantreg  package  for  the  quantile  regressions. 

Quantile  regression.  An  important  point  to  remember  about 
the  OLS  estimates  is  that  they  are  designed  to  represent  the  best 
overall  estimate  for  all  students,  and  therefore  are  most  repre¬ 
sentative  of  students  with  the  average  level  of  reading  compre¬ 
hension.  A  critical  innovation  of  this  study  was  to  determine 
whether  these  relationships  differed  depending  on  children’s 
reading  comprehension  ability.  To  do  this,  we  used  quantile 
regression  analysis  to  examine  how  each  construct  was  related 
to  reading  comprehension  individually  at  different  quantiles, 
and  how  constructs  were  uniquely  related  to  reading  compre¬ 
hension  while  controlling  for  the  influences  of  the  others.  Our 
questions  are  well  suited  to  quantile  regression,  as  this  tech¬ 
nique  allows  for  the  estimation  of  relations  between  a  depen¬ 
dent  and  independent  variable  at  multiple  locations  (i.e.,  quan¬ 
tiles)  of  the  dependent  variable.  Quantile  regression  calculates 
the  strength  of  these  relations  without  creating  subgroups 
(which  would  violate  the  normality  assumption  of  OLS  regres¬ 
sion).  Rather,  it  uses  every  observation  when  estimating  the 


relations  at  a  given  point  in  the  distribution,  but  each  observa¬ 
tion  is  weighted  differentially  depending  on  its  proximity  to  the 
quantile  being  estimated;  points  that  are  closer  get  a  stronger 
weight,  and  those  farther  away  are  assigned  a  weaker  weight. 
Therefore  the  estimates  of  the  relation  that  are  conducted  at 
each  point  are  unique  to  that  point.  The  resulting  estimates  of  a 
quantile  regression  are  called  conditional  estimates.  The  con¬ 
ditional  estimates  at  the  median,  for  example,  would  be  repre¬ 
sented  by  a  single  line  through  a  scatterplot  of  points.  But, 
rather  than  an  average  estimate  of  the  entire  sample  as  is  the 
result  of  the  OLS  regression,  the  quantile  regression  estimates 
the  strength  of  the  relation  at  each  selected  point  along  the 
distribution  of  reading  comprehension.  Because  of  this 
weighted  method  of  estimation,  quantile  regression  has  no 
assumptions  of  the  variance  in  the  residual  error  terms,  no 
assumptions  on  the  functional  form  of  the  relation,  and  is  robust 
to  outliers  and  nonnormally  distributed  data  (Koenker,  2005). 

In  the  current  study,  we  chose  to  estimate  the  relations 
between  the  constructs  at  9  points  in  the  distribution  of  reading 
comprehension  (the  .10  quantile  to  the  .90  quantile)  to  give  as 
few  estimates  as  possible,  while  still  providing  an  overall 
representation  of  how  the  functional  relationship  changes  along 
the  distribution  of  reading  comprehension.  Critically,  the  results 
for  the  reported  quantiles  would  not  vary  if  additional  points 
were  selected;  these  estimates  are  representative  only  of  the 
point  described  and  not  of  a  group  of  surrounding  points. 
Because  the  data  had  a  partially  nested  structure  (factor  ICCs 
ranged  from  .10  to  .17;  see  Table  3),  all  significance  tests  ( t  and 
F  critical  values)  were  adjusted  using  a  conservative  cluster- 
correction  coefficient  adapted  from  Hedges  (2007),  with  de¬ 
grees  of  freedom  adjusted  as  a  function  of  the  ICC,  cluster  size, 
and  total  sample  size. 
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Results 

The  results  are  presented  first  for  all  univariate  estimates;  where 
each  pressure  point  is  considered  in  its  sole  concurrent  prediction 
of  reading  comprehension.  Second,  we  present  all  multivariate 
results,  which  provide  evidence  of  each  component’s  unique  con¬ 
current  prediction  of  reading  comprehension. 

Univariate  results.  Descriptive  information  about  the  ex¬ 
tracted  factors  is  presented  in  Table  3,  noting  that  all  factors  were 
standardized  to  a  mean  of  zero  and  a  standard  deviation  of  one. 
Skewness  demonstrates  that  the  distribution  of  each  factor  is 
approximately  normal  with  a  slightly  negative  skew.  Also  pre¬ 
sented  in  Table  3  are  between-f actor  correlations  which  demon¬ 
strate  that,  though  estimated  separately,  all  components  are  mod¬ 
erately  correlated.  Of  particular  interest  in  the  correlation  matrix  is 
how  each  of  the  factors  correlate  with  the  reading  comprehension 
factor,  reported  in  the  first  column  of  Table  3,  as  the  correlation  is 
akin  to  a  regression  standardized  beta  weight.  Results  indicated 
that  each  potential  “pressure  point”  was  strongly  and  significantly 
correlated  with  reading  comprehension,  with  r  values  ranging  from 
.597  to  .772  (each  explaining  35%  to  59%  of  the  variance  in 
reading  comprehension  when  used  as  an  individual  predictor). 

The  results  of  each  individual  quantile  regression  analysis  for 
each  predictor  are  presented  in  Figure  1 .  In  this  figure,  the  x-axis 
represents  each  selected  quantile  of  reading  comprehension,  and 
the  y-axis  represents  the  strength  of  the  relation  between  the 
predictor  and  reading  comprehension.  Note  that  all  estimates  of 
these  relations  were  found  to  be  significantly  different  from  zero 
(all  corrected  p  values  <.001).  Because  the  factor  scores  were 
standardized,  and  each  analysis  only  has  one  predictor,  these 
coefficients  can  be  interpreted  like  correlations  (ranging  from  —  1 
to  +  1,  with  0  indicating  no  relation).  For  example,  the  first  graph 
represents  the  estimates  relating  reading  comprehension  with  vo¬ 
cabulary.  At  the  low  end  of  reading  comprehension  (10th  quantile) 
the  relation  between  reading  comprehension  and  vocabulary  is 
very  strong:  estimate  =  .90.  This  means  that  two  children  at  the 
10th  quantile  in  reading  comprehension  who  differ  by  one  standard 
deviation  in  vocabulary  skill  are  predicted  to  have  an  almost 
identical  difference  in  reading  comprehension  skill  (.90  standard 
deviations).  At  the  highest  end  of  reading  comprehension,  the  90th 
quantile,  the  relation  is  relatively  weaker  (estimate  =  .54),  but  still 
significantly  different  from  zero.  This  indicates  that  vocabulary  is 
still  significantly  related  to  reading  comprehension  when  consid¬ 
ered  alone,  even  for  students  with  excellent  reading  comprehen¬ 
sion  skills.  The  results  for  the  other  constructs  follow  a  similar 
pattern  to  vocabulary.  In  each  case,  constructs  tend  to  be  more 


predictive  at  the  lower  end  of  the  distribution  of  reading  compre¬ 
hension  than  the  higher  end,  but  each  predictor  is  significantly 
related  to  the  outcome  across  the  distribution  of  reading  compre¬ 
hension  skill. 

To  further  examine  these  trends,  we  conducted  statistical  com¬ 
parisons  between  quantiles  to  test  whether  the  relation  of  each 
predictor  is  stronger  at  one  point  in  the  distribution  than  another 
(Petscher  &  Logan,  2014).  A  priori  we  selected  three  points  to 
compare:  the  .20,  .50,  and  .80  quantiles,  representing  the  low,  mid, 
and  high  range  of  reading  comprehension.  Though  all  estimates  are 
visible  in  Figure  1,  exact  estimates  of  the  associations  between 
each  predictor  and  the  outcome  are  presented  in  Table  4,  along 
with  the  results  of  the  between-quantile  comparisons.  From  Table 
4,  we  see  that  the  prediction  of  reading  comprehension  was  sig¬ 
nificantly  better  for  poor  comprehenders  than  good  comprehenders 
(as  evidenced  by  a  significant  contrast  of  the  .2  and  .8  quantile 
estimates)  for  vocabulary,  grammar,  higher-level  language,  and 
word  reading,  but  not  for  memory.  Also  for  language  constructs 
only,  the  prediction  of  reading  comprehension  was  significantly 
better  for  average  comprehenders  (.5)  than  for  good  comprehend¬ 
ers  (.8);  suggesting  a  decrease  in  the  contribution  of  language 
components  to  reading  comprehension  as  one  moves  from  poor  to 
better  comprehenders. 

Multivariate  results.  Next,  OLS  regression  was  used  to  ex¬ 
amine  how  all  pressure  points  contributed  to  the  concurrent  pre¬ 
diction  of  reading  comprehension  when  controlling  for  one  another 
in  a  multiple  regression.  The  first  column  of  Table  5  displays  the 
results  of  the  OLS  multiple  regression,  and  demonstrated  that 
vocabulary,  grammar,  higher-level  language,  and  word  reading 
each  explained  significant  unique  variance  in  reading  comprehen¬ 
sion.  In  contrast,  working  memory  did  not  show  any  unique 
predictive  utility  above  and  beyond  the  other  four  constructs. 
Overall,  the  model  accounted  for  69%  of  the  variance  in  reading 
comprehension. 

A  quantile  multiple  regression  was  conducted  to  examine  how 
each  construct  was  predictive  of  reading  comprehension  after 
controlling  for  the  others.  The  factor  scores  were  all  standardized 
(M  =  0,  SD  ~  1).  Therefore  resulting  coefficients  can  all  be 
interpreted  as  partial  effects.  In  line  with  the  simple  regression 
results,  the  quantile  multiple  regression  was  also  estimated  at  nine 
different  points  in  the  reading  comprehension  distribution.  For 
ease  of  comparison,  the  results  of  three  of  those  points  are  reported 
in  Table  5  (for  all  nine,  see  Figure  2,  which  can  be  read  the  same 
way  as  Figure  1,  except  that  coefficients  are  partial  effects).  Table 
5  contains  the  coefficients  and  the  cluster-adjusted  p  values  for 


C 

QJ 

'u 


it 


a; 

o 

u 


Vocabulary  Grammar  High-Level  Language  Word  Reading  Memory 


Quantile  of  Reading  Comprehension 


Figure  1.  Results  of  quantile  regression  for  each  separate  predictor  of  reading  comprehension. 
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Table  4 

Quantile  Univariate  Regression  Coefficients  and  Comparisons 
for  Predictors  of  Reading  Comprehension 


Construct  predictor 

Quantile 

coefficients 

p-value  of  comparisons 

.2 

.5 

.8 

.2  vs.  .5 

.2  vs.  .8 

.5  vs.  .8 

Vocabulary 

.91 

.75 

.57 

.104 

.011 

.029 

Grammar 

.85 

.78 

.65 

.578 

.035 

.008 

High  language 

.77 

.68 

.57 

.056 

.002 

.081 

Word  reading 

.70 

.63 

.45 

.126 

.015 

.161 

Memory 

.60 

.57 

.45 

.626 

.610 

.513 

*  All  quantile  coefficients  were  significantly  different  from  zero,  with 
corrected  ^-values  of  <.0001. 


each  predictor  for  the  OLS  regression  and  the  quantile  regression 
at  four  different  quantiles  of  reading  comprehension.  For  example, 
at  the  .20  quantile  (approximately  the  20th  percentile),  the  inter¬ 
cept  of  reading  comprehension  is  -0.45,  and  the  coefficient  asso¬ 
ciating  vocabulary  with  reading  comprehension  (after  controlling 
for  the  effects  of  the  other  predictors)  is  a  significant  .23  (evi¬ 
denced  by  the  confidence  intervals  not  overlapping  with  zero). 
Grammar,  higher-level  language,  and  word  reading  were  also 
significant  predictors  of  reading  comprehension  at  the  .20  quantile, 
but  memory  was  not  (see  Table  5).  These  results  can  also  be 
visually  compared  to  the  OLS  results.  For  example  the  OLS 
estimate  for  vocabulary  has  the  strength  of  the  relation  between 
vocabulary  and  reading  comprehension  at  .27  (at  the  mean),  which 
is  similar  to  the  quantile  regression  results  at  the  low  end  of 
reading  comprehension  (.20  quantile)  but  comparatively  weaker  to 
the  results  near  the  median  of  reading  comprehension  (.60  quantile 
where  the  estimate  is  .37). 

Across  the  reading  comprehension  distribution,  vocabulary, 
grammar,  and  higher-level  language  were  consistently  significant 
predictors,  suggesting  that  these  three  component  skills  comprise 
reading  comprehension  regardless  of  the  skill  level.  The  findings 


from  the  quantile  multiple  regression  were  also  consistent  for 
memory;  memory  was  not  a  significant  predictor  at  any  of  the 
quantiles.  In  contrast,  differential  effects  were  found  when  exam¬ 
ining  word  reading;  this  construct  was  significant  only  at  the  lower 
end  of  reading  comprehension  (when  reading  comprehension  is  at 
or  below  the  .40  quantile).  This  suggests  that  word  reading  is  an 
important  component  skill  for  children  with  poor  comprehension, 
but  is  not  uniquely  related  to  comprehension  for  children  with 
good  comprehension.  To  further  examine  these  trends,  we  con¬ 
ducted  comparisons  between  quantiles  using  the  same  procedure 
described  earlier.  Only  one  significant  difference  was  found:  Word 
reading  was  a  significantly  better  predictor  of  reading  comprehen¬ 
sion  at  the  low  end  (.20  quantile)  than  the  high  end — .80  quantile; 
corrected  F(1  82)  =  4.56,  p  —  .036. 

Also  included  in  Table  5  are  estimates  of  the  percentages  of 
variance  in  reading  comprehension  accounted  for  at  each  of  the 
four  quantiles.  These  were  calculated  using  a  pseudo-/?2  (Petscher, 
Logan,  &  Zhou,  2013),  which  is  designed  to  produce  an  estimate 
of  variance  explained  comparable  to  the  traditional  OLS  R2.  The 
pseudo-/?2  values  demonstrate  that  there  is  a  higher  percentage  of 
variance  explained  in  reading  comprehension  for  children  with 
poor  comprehension  skills  (84%)  in  comparison  to  those  with  good 
reading  comprehension  skills  (53%).  This  finding  is  consistent 
with  the  individual  quantile  regression  analyses  that  also  showed  a 
weaker  relationship  between  constructs  and  reading  comprehen¬ 
sion  at  the  higher  quantiles. 

Discussion 

We  examined  how  specific  pressure  points  or  areas  of  vulner¬ 
ability  uniquely  influence  reading  comprehension  and  whether  or 
not  the  unique  predictors  vary  as  a  function  of  the  level  of  chil¬ 
dren’s  reading  comprehension.  As  expected,  we  found  that  word- 
level  semantic  knowledge  was  significantly  related  to  reading 
comprehension:  our  vocabulary  construct  was  found  to  be  a  sig¬ 
nificant  predictor  of  reading  comprehension  at  all  quantiles  exam- 


Table  5 

Ordinary  Least  Squares  ( OLS )  and  Quantile  Multiple  Regression  Results  for  Predictors  of 


Reading  Comprehension 


Predictor  and 
adjusted  p 

OLS 

Quantile  coefficients 

p-value  of  comparisons 

.2 

.5 

.8 

.2  vs.  .5 

.2  vs.  .8 

.5  vs.  .8 

Intercept 

.00 

-.45 

.08 

.42 

Vocabulary 

.27 

.23 

.38 

.32 

Adjusted  p 

<.001 

.004 

<.001 

.001 

.047 

.345 

.403 

Grammar 

.26 

.25 

.19 

.24 

Adjusted  p 

.006 

.050 

.065 

.025 

.619 

.962 

.627 

High-level  language 

.27 

.33 

.23 

.26 

Adjusted  p 

<.001 

.001 

.005 

.005 

.260 

.553 

.705 

Word  reading 

.16 

.22 

.12 

.02 

Adjusted  p 

.003 

.009 

.067 

.695 

.181 

.036 

.142 

Memory 

.12 

.15 

.13 

.12 

Adjusted  p 

.100 

.122 

.102 

.118 

.849 

.827 

.931 

R-squareda 

.69 

.84 

.70 

.53 

aR-squared  for  quantile  regression  was  estimated  using  pseudo  R-squared  (Petscher,  Logan,  &  Zhou,  2013). 
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Figure  2.  Results  of  quantile  multiple  regression  showing  the  unique  relationship  of  each  predictor  to  reading 
comprehension. 


ined.  In  addition,  other  language  factors  (i.e.,  grammar  and  higher- 
level  language)  were  also  significant  predictors  of  reading 
comprehension,  again  across  quantiles.  Not  only  were  language 
constructs  individually  related  to  reading  comprehension,  but  each 
showed  a  unique  relationship  to  reading  comprehension  after  con¬ 
trolling  for  the  effects  of  the  others. 

In  another  study  using  this  same  dataset,  an  emergent  structure 
for  language  was  found.  Specifically,  vocabulary  and  grammar 
represented  a  single  construct  during  preschool/kindergarten  but 
separate  constructs  by  third  grade.  In  addition,  higher-level  lan¬ 
guage  was  clearly  separable  from  vocabulary  and  grammar  by 
third  grade  (Language  and  Reading  Research  Consortium,  2015b). 
It  was  argued  on  the  basis  of  those  results  that  vocabulary,  gram¬ 
mar,  and  higher-level  language  represented  different  dimensions  of 
language  at  this  grade.  The  present  results  showing  that  each  of 
these  constructs  explains  unique  variance  in  reading  comprehen¬ 
sion  provides  further  evidence  for  the  dimensionality  of  language 
and  our  hypothesis  that  higher-level  language  skills  make  a  spe¬ 
cific  contribution  to  reading  comprehension  outcomes  independent 
from  that  of  vocabulary  and  grammar,  in  contrast  to  other  accounts 
(e.g.,  Hulme  &  Snowling,  2011).  This  finding  is  in  line  with 
theoretical  models  of  reading  comprehension,  which  agree  that  the 
product  of  reading  comprehension  is  a  mental  model  of  the  text’s 
meaning  constructed  by  integrating  the  meanings  of  the  proposi¬ 
tions  in  the  text  and  inferring  connections  between  these  (e.g., 
Kintsch,  1998).  Further,  the  contribution  of  higher-level  skills, 
such  as  inference  making  to  reading  comprehension,  cannot  be 
explained  simply  in  terms  of  their  resource  demands.  Inference 
draws  on  vocabulary  as  well  as  working  memory  (Cain  &  Oakhill, 
2014;  Cain,  Oakhill,  &  Bryant,  2004),  but  its  contribution  to 
reading  comprehension  outcomes  in  the  current  study  was  signif¬ 
icant  when  these  were  controlled.  Our  findings  suggest  that  these 
different  dimensions  of  language  are  each  critical  to  reading  com¬ 
prehension  in  young  readers.  Furthermore,  the  range  of  skills 
associated  with  reading  comprehension  outcomes  might  be  one 
reason  why  many  interventions  with  older  poorer  readers  have 
only  moderate  impacts  (Edmonds  et  al.,  2009). 

Each  of  the  language  constructs  by  themselves  were  more 
related  to  reading  comprehension  at  low  ability  levels  than  at  the 
higher  levels.  Also,  our  multiple  regression  model  accounted  for 
much  less  variance  at  the  higher  than  lower  quantiles.  Whereas  a 
small  ceiling  effect  in  several  of  our  constructs  may  have  contrib¬ 


uted  to  this  decline,  it  is  unlikely  to  have  been  a  major  factor.  In 
fact,  it  is  more  probable  that  other  factors,  not  considered  in  this 
study,  play  a  more  important  role  in  accounting  for  variance 
among  good  comprehenders.  One  such  factor  may  be  background 
information.  There  is  considerable  evidence  that  prior  knowledge 
of  the  topic  is  critical  to  reading  comprehension  in  most  contexts 
(e.g.,  Compton,  Miller,  Gilbert,  &  Steacy,  2013;  Kendeou  &  van 
den  Broek,  2005;  Schneider,  Korkel,  &  Weinert,  1989).  Back¬ 
ground  knowledge  allows  readers  to  better  make  inferences  and 
build  coherence  and  memory  representations  of  written  text 
(Kintsch  &  Rawson,  2005).  It  may  be  that  this  background  knowl¬ 
edge  plays  an  important  role  in  differentiating  children  who  have 
good  language  and  other  cognitive  skills  related  to  reading.  Alter¬ 
natively,  a  likely  factor  that  could  differentiate  children  at  the 
higher  end  of  the  reading  comprehension  distribution  is  standard  of 
coherence,  which  is  children’s  explicit  or  implicit  criteria  for  how 
coherent  their  understanding  of  a  passage  should  be  (van  den 
Broek,  Bohn-Gettler,  Kendeou,  Carlson,  &  White,  2011).  Standard 
of  coherence  is  influenced  by  task  variables,  such  as  the  purpose  of 
reading,  but  also  by  one’s  motivation,  interest  in  a  topic  or  activity, 
or  the  presence/absence  of  distractors  or  secondary  tasks.  The 
latter  seems  particularly  relevant  in  a  testing  situation  like  that  in 
the  present  study.  Children  with  similar  language  and  cognitive 
abilities  may  set  very  different  standards  of  coherence  in  this 
reading  activity,  and  as  a  result,  vary  in  their  ability  to  answer 
comprehension  questions.  Of  course,  future  research  will  be 
needed  to  examine  this  and  other  possible  factors  as  they  relate  to 
good  comprehension. 

Among  the  language  factors,  grammar  was  the  construct  most 
closely  related  to  reading  comprehension.  This  finding  is  consis¬ 
tent  with  evidence  of  grammatical  problems  in  poor  comprehend¬ 
ers  (Adlof  &  Catts,  2015;  Catts  et  al.,  2006;  Cragg  &  Nation, 

2006) ,  as  well  as  the  prediction  of  reading  comprehension  by 
grammar  across  time  (Muter,  Hulme,  Snowling,  &  Stevenson, 
2004).  Given  that  our  grammar  construct  most  likely  includes 
other  skills,  such  as  semantic  knowledge  and  memory  (Cain, 

2007) ,  it  is  surprising  that  grammar  should  be  such  an  important 
predictor  once  independent  measures  of  those  factors  were  con¬ 
trolled.  One  reason  for  the  strength  of  this  predictor  may  be  that 
grammar  serves  a  wider  integrative  function  that  extends  beyond 
individual  sentence  comprehension;  It  enables  readers  to  integrate 
across  clauses  and  sentences  to  construct  text-level  representa- 
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tions.  There  is  empirical  support  for  this  viewpoint:  children  with 
comprehension  difficulties  are  poor  at  pronoun  resolution,  which 
limits  their  ability  to  link  clauses  and  sentences  within  a  text 
(Oakhill  &  Yuill,  1986).  In  addition,  our  construct  of  grammar 
included  measures  of  morphological  knowledge.  Morphology  sup¬ 
ports  both  word  reading  and  reading  comprehension  (Deacon  & 
Kirby,  2004)  and,  not  surprisingly,  is  weak  in  children  with  read¬ 
ing  comprehension  difficulties  (Tong  et  al.,  2014).  Thus,  the  strong 
and  consistent  influence  of  grammar  found  here  may  be  because 
we  made  a  comprehensive  assessment  of  this  construct  that  tapped 
the  broad  extent  of  grammar  at  both  the  word-  and  text-level. 

In  line  with  our  predictions,  word  decoding  was  significantly 
and  uniquely  related  to  reading  comprehension  in  both  the  OLS 
and  quantile  regression  analyses.  However,  in  the  quantile  multiple 
regression  analyses,  word  reading  was  only  a  unique  predictor  at 
the  lower  quantiles  (.40  and  below).  These  results  are  consistent 
with  other  studies  demonstrating  that,  in  the  early  school  grades, 
word  reading  accounts  for  more  unique  variance  in  reading  com¬ 
prehension  than  at  later  grades  (Catts,  Hogan,  &  Adlof,  2005; 
Gough  et  al.,  1996;  Language  and  Reading  Research  Consortium, 
2015a).  For  higher  skilled  readers,  language  abilities  were  found  to 
be  more  uniquely  associated  with  reading  comprehension. 

Whereas  working  memory  was  related  to  reading  comprehen¬ 
sion  when  considered  by  itself,  it  did  not  explain  unique  variance 
in  either  the  OLS  or  quantile  regression  analyses.  There  may  be 
several  reasons  for  this  finding.  First,  it  has  been  argued  that  verbal 
working  memory,  which  was  how  it  was  operationalized  in  this 
study,  is  to  a  large  extent  a  reflection  of  children’s  basic  language 
ability  (Gathercole  &  Baddeley,  1993).  Children  with  good  verbal 
skills  more  quickly  activate  and  store  verbal  items  in  memory 
(Nation  et  al.,  1999)  and  recent  work  suggests  that  the  influence  of 
working  memory  on  children’s  inference  making  is  mediated  by 
vocabulary  knowledge  (Currie  &  Cain,  2015).  Further,  the  unique 
variance  in  children’s  reading  comprehension  explained  by  work¬ 
ing  memory  is  reduced  significantly  when  considered  alongside 
higher-level  language  skills  (Cain,  Oakhill,  &  Bryant,  2004),  and 
working  memory  is  not  a  unique  predictor  of  reading  comprehen¬ 
sion  longitudinally  when  considered  alongside  a  range  of  language 
skills  (Oakhill  &  Cain,  2012).  Our  findings  question  the  working 
memory  capacity  constraint  account  of  poor  reading  comprehen¬ 
sion  in  line  with  recent  studies  of  adults  (van  Dyke,  Johns,  & 
Kukona,  2014)  and  support  the  call  for  further  research  to  under¬ 
stand  better  how  language  skills  and  working  memory  interact  to 
support  reading  comprehension. 

Another  possibility  is  that  our  working  memory  measures  were 
not  representative  of  the  type  of  working  memory  that  is  important 
for  reading  comprehension.  Measures  of  verbal  working  memory 
that  involve  both  storage  and  processing  or  manipulation  of  verbal 
stimuli,  and  also  those  with  a  sentence  comprehension  component, 
are  most  strongly  predictive  of  children’s  and  adults’  reading 
comprehension  (Carretti,  Borella,  Comoldi,  &  de  Beni,  2009; 
Daneman  &  Merikle,  1996;  Siegel  &  Ryan,  1989).  Although  all  of 
our  working  memory  tasks  tapped  the  storage  and  processing 
resources  of  verbal  working  memory,  we  selected  tasks  that  did  not 
include  comprehension  of  sentences  to  examine  the  unique  pre¬ 
diction  of  memory  over  and  above  our  assessments  of  grammar 
and  higher-level  language  skills.  We  would  expect  that  measures 
of  working  memory  such  as  the  listening  span  task  to  be  more 
strongly  predictive  of  children’s  reading  comprehension.  But  then 


again,  such  a  task  would  be  expected  to  have  more  overlap  with 
our  language  measures,  and  thus  explain  less  unique  variance. 

We  considered  only  child-level  pressure  points  (i.e.,  how  indi¬ 
vidual  differences  in  language  skills  were  related  to  reading  com¬ 
prehension  outcomes)  and  sought  to  explain  their  unique,  rather 
than  interactive,  influence.  Several  of  these  skills  meet  the  criteria 
for  a  pressure  point:  Theoretically  the  language  skills  we  studied 
are  integral  to  text  comprehension,  their  unique  influence  across 
the  ability  range  confirmed  their  validity  as  component  compre¬ 
hension  skills,  and  each  is  potentially  malleable  through  instruc¬ 
tion  and  intervention  (Compton  &  Pearson,  2016).  However,  our 
analytic  framework  did  not  take  into  account  text  characteristics 
and  how  these  can  interact  with  reader  characteristics  to  influence 
comprehension  (e.g.,  McNamara,  Kintsch,  Songer,  &  Kintsch, 
1996).  A  consideration  of  the  text  demands  may  help  to  explain 
why  word  reading  skills  did  not  have  a  unique  influence  on  reading 
comprehension  across  the  ability  range.  It  may  be  that  only  when 
there  is  a  mismatch  between  the  decoding  level  of  the  text  and 
reader  skills  (as  is  the  case  for  weak  decoders,  and  younger 
readers)  that  word  reading  is  found  to  be  a  pressure  point.  Con¬ 
sidered  in  a  developmental  context,  we  might  speculate  that  dif¬ 
ferent  pressure  points  are  paramount  at  different  points  in  devel¬ 
opment.  Future  studies  should  also  consider  how  different  reader 
skills  work  in  concert  to  support  comprehension  processing  and 
how  characteristics  of  the  text  (e.g.,  decoding  level,  topic,  and  also 
cohesion)  interact  with  reader  skills  to  influence  comprehension 
(Compton  &  Pearson,  2016). 

Implications 

Our  general  aim  was  to  understand  better  the  factors  that  predict 
reading  comprehension  success  and  failure.  Our  findings  have  both 
theoretical  and  practical  implications.  First,  we  found  a  nonlinear 
relationship  between  word  reading  and  reading  comprehension; 
word  reading  was  significantly  related  to  reading  comprehension 
only  for  poor  readers.  In  relation  to  our  theoretical  frameworks, 
this  finding  indicates  that  the  impact  of  word  reading  on  reading 
comprehension  not  only  decreases  across  the  course  of  develop¬ 
ment  (Language  and  Reading  Research  Consortium,  2015a),  but 
also  across  the  ability  range,  in  line  with  the  simple  view  of 
reading.  Critically,  our  findings  extend  this  work  by  suggesting 
that  language  skills,  as  well  as  word  reading  skills,  may  exert 
different  influences  on  reading  comprehension  for  different  reader 
profiles.  In  particular,  the  impact  of  these  skills  appears  to  be  lower 
at  the  higher  ability  levels  and  factors  other  than  word  reading, 
language,  and  memory  may  be  operative  in  this  range  (see  also 
Compton  et  al.,  2014,  for  discussion  of  this  point). 

In  terms  of  instruction,  these  findings  highlight  the  need  for  a 
focus  on  a  range  of  skills,  including  word  reading,  to  support  the 
development  of  good  reading  comprehension,  as  advocated  else¬ 
where  (Snow,  2002).  In  relation  to  assessment,  our  results  show 
that  word  reading  is  not  a  proxy  measure  for  reading  comprehen¬ 
sion  and  converge  with  research  highlighting  the  need  for  reading 
comprehension  assessments  that  are  not  unduly  influenced  by 
decoding  skills  (Keenan,  Betjemann,  &  Olson,  2008).  To  improve 
the  tools  available  for  language  and  literacy  research,  we  note  that 
the  assessments  of  higher-level  language  skills  require  additional 
measurement  work  because  internal  consistency  was  below  com¬ 
monly  accepted  cutoffs,  particularly  for  the  inference  task.  How- 
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ever,  measurement  error  was  minimized  by  the  use  of  more  than 
one  indicator  for  each  of  our  language  constructs. 

These  insights  into  assessment,  instruction,  and  the  relationships 
between  different  language  skills  and  reading  were  possible 
through  our  use  of  quantile  multiple  regression.  This  analytic 
approach  could  be  a  useful  method  to  examine  other  aspects  of 
reading  development.  By  knowing  the  relationships  across  readers 
of  varying  ability  levels,  we  not  only  could  expand  our  theoretical 
understanding  of  reading,  but  may  also  be  able  to  improve  our 
ability  to  identify  critical  pressure  points  and  enhance  our  ability  to 
identify  and  treat  poor  readers.  Quantile  regression  allows  us  the 
opportunity  to  take  advantage  of  data  across  the  full  range  of 
readers,  while  at  the  same  time  providing  information  specific  to 
children  from  the  low  ability  range.  As  such,  this  approach  may 
serve  as  a  useful  companion  approach  to  group  studies  of  children 
with  reading  disabilities. 
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Although  rapid  automatized  naming  (RAN)  is  one  of  the  best  predictors  of  reading  across  languages,  its 
nature  remains  elusive.  In  the  present  study,  we  aim  to  elucidate  the  nature  of  RAN  by  examining  the 
cognitive  and  environmental  correlates  of  RAN.  One  hundred  forty-one  second-year  kindergarten 
Chinese  children  (71  girls,  70  boys;  mean  age  =  58.99  months)  were  assessed  on  measures  of  nonverbal 
cognitive  ability,  attention,  visual  processing,  conceptual  processing,  semantic  processing,  phonological 
processing,  short-term  memory,  articulation,  speed  of  processing,  RAN  (digits  and  objects),  and  discrete 
naming.  We  also  collected  information  on  mothers’  education  and  occupation,  and  children’s  home 
learning  experiences.  The  results  showed  that  formal  home  learning  experiences,  visual  processing, 
phonological  processing,  and  articulation  were  unique  correlates  of  both  RAN  tasks.  Semantic  processing 
also  correlated  significantly  with  RAN  objects.  However,  controlling  for  the  effects  of  discrete  naming 
eliminated  the  effects  of  most  subprocesses  on  RAN.  These  findings  suggest  that  RAN  is  indeed 
multicomponential,  but  not  all  components  contribute  the  same  way  to  RAN  performance.  Theoretical 
and  practical  implications  of  these  findings  are  discussed. 

Keywords:  rapid  automatized  naming,  phonological  processing,  speed  of  processing,  visual  processing, 
articulation 


Several  studies  have  established  that  rapid  automatized  naming 
(RAN) — the  ability  to  name  as  fast  as  possible  an  array  of  highly 
familiar  stimuli  such  as  letters,  digits,  colors  and  objects — is  a 
strong  predictor  of  reading  and  a  core  deficit  in  dyslexia  (see  Kirby 
et  al.,  2010,  for  a  review).  The  popularity  of  RAN  has  grown  in  the 
last  two  decades  because  of  its  documented  success  in  predicting 
reading  in  different  languages  (e.g.,  Chinese:  Liao,  Deng,  Hamil¬ 
ton,  Lee,  Wei,  &  Georgiou,  2015;  Dutch:  de  Jong,  2011;  German: 
Landerl  &  Wimmer,  2008;  Greek:  Protopapas,  Altani,  &  Geor¬ 
giou,  2013;  English:  Parrila,  Kirby,  &  McQuarrie,  2004;  Korean: 
Kim,  2011;  see  also  Araujo,  Reis,  Petersson,  &  Faisca,  2015,  for  a 
recent  meta-analysis)  and  after  controlling  for  the  effects  of  other 
key  predictors  of  reading  such  as  letter  knowledge  (e.g.,  Kirby, 
Parrila,  &  Pfeiffer,  2003),  phonological  awareness  (e.g.,  de  Jong  & 
van  der  Leij,  1999),  orthographic  knowledge  (e.g.,  Georgiou, 
Parrila,  &  Kirby,  2009),  and  morphological  awareness  (e.g.,  Kim, 
201 1).  Despite  the  fact  that  RAN  is  widely  used  as  a  predictor  of 
reading,  little  is  known  about  the  factors  (cognitive  and  environ¬ 
mental)  that  are  associated  with  RAN  performance.  Identifying  the 
correlates  of  RAN  performance  is  important  because  it  can  poten¬ 
tially  reveal  what  processes  underlie  RAN’s  relationship  with 
reading. 
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Currently,  there  is  no  consensus  among  researchers  regarding 
the  reason(s)  why  RAN  is  related  to  reading  (Kirby  et  al.,  2010). 
Initially,  Wagner  and  Torgesen  (1987)  argued  that  RAN  is  related 
to  reading  because  they  both  rely  on  quick  access  and  retrieval  of 
phonological  representations  from  long-term  memory.  However, 
discrete  naming  (when  items  are  presented  one  at  a  time)  requires 
as  much  access  to  and  retrieval  of  phonological  representations  as 
RAN,  but  does  not  correlate  as  strongly  with  reading  as  RAN  (e.g., 
Bowers  &  Swanson,  1991;  Logan  &  Schatschneider,  2014).  Bow¬ 
ers  and  Wolf  (1993),  in  turn,  emphasized  the  extraphonological 
properties  of  RAN  and  suggested  that  RAN  predicts  reading  be¬ 
cause  of  its  contribution  to  orthographic  processing.  However, 
there  are  studies  showing  that  children  with  naming  speed  deficits 
do  not  necessarily  experience  orthographic  processing  deficits 
(e.g.,  Conrad  &  Levy,  2007;  Powell,  Stainthorp,  &  Stuart,  2014). 
Kail  and  Hall  (1994)  and  Amtmann,  Abbott,  and  Beminger  (2007) 
attributed  the  RAN-reading  relationship  to  domain-general  factors 
such  as  speed  of  processing  and  short-term  memory  (STM).  How¬ 
ever,  there  is  evidence  to  suggest  that  RAN  is  only  weakly  related 
to  measures  of  speed  of  processing  or  STM  (e.g.,  Georgiou  et  al., 
2009;  Swanson  &  Kim,  2007).  Finally,  Norton  and  Wolf  (2012) 
described  RAN  as  a  microcosm  of  the  later  developing  reading 
system,  tapping  many  of  the  same  processes.  Although  there  is 
some  evidence  to  support  this  theoretical  account  (e.g.,  Lervag  & 
Hulme,  2009),  the  fact  that  the  cognitive  processes  underlying 
successful  reading  performance  are  not  overly  stable  throughout 
development  (e.g.,  Cardoso-Martins  &  Pennington,  2004;  Wagner, 
Torgesen,  &  Rashotte,  1994)  suggests  that  the  basis  of  the  corre¬ 
lation  between  RAN  and  reading  at  some  skill  level  may  differ 
widely  from  that  in  another  skill  level. 
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Torgesen,  Wagner,  Rashotte,  Burgess,  and  Hecht  (1997)  sug¬ 
gested  that  “our  understanding  of  rapid  naming  ability’s  relation  to 
reading  development  in  general,  and  orthographic  development  in 
particular,  will  be  enhanced  to  the  extent  that  we  make  progress  in 
dissecting  the  component  skills  involved  in  performance  on  rapid 
naming  tasks’’  (p.  183).  A  good  starting  point  would  then  be  to 
look  at  existing  theoretical  models  of  RAN.  Wolf  and  Bowers 
(1999)  proposed  a  model  for  letter  naming  (see  Figure  1),  which 
involves  (a)  attention  to  stimulus;  (b)  visual  processes  that  are 
responsible  for  initial  feature  detection,  visual  discrimination,  and 
letter-pattern  identification;  (c)  integration  of  visual  feature  and 
pattern  information  with  stored  orthographic  and  phonological 
representations;  (d)  access  and  retrieval  of  phonological  labels;  (e) 
activation  and  integration  of  semantic  and  conceptual  information; 
and  (f)  motoric  activation.  In  addition,  because  information  must 
be  integrated  within  and  between  different  subprocesses,  speed  of 
processing  is  also  expected  to  account  for  individual  differences  in 
RAN  letters.  More  recently,  Georgiou  (2010)  proposed  a  model  for 
object  naming  according  to  which  objects  can  be  named  first  by 
linking  the  perception  of  the  object  to  a  conceptual  system  (see 
Figure  2).  After  accessing  the  semantic  lexicon,  name  retrieval 
takes  place,  which  is  predominately  a  matching  process  between 
the  semantic  information  activated  by  the  object  and  the  label  that 
is  stored  in  long-term  memory.  When  the  phonological  represen¬ 
tation  of  an  object  is  accessed,  articulation  can  be  prepared.  Wolf 
and  Bowers’  (1999)  letter-naming  model  and  Georgiou’s  (2010) 
object-naming  model  differ  in  two  important  aspects:  Georgiou’s 


(2010)  model  does  not  make  any  reference  to  attentional  processes 
or  speed  of  processing.  On  the  other  hand,  it  reserves  a  more 
prominent  role  to  semantic  processing  that  is  not  occupying  a 
central  position  in  Wolf  and  Bowers  (1999)  model. 

To  date,  only  a  few  studies  have  examined  the  relationship  of 
different  subprocesses  with  RAN  and  have  provided  partly  con¬ 
tradictory  findings  (Amell,  Joanisse,  Klein,  Busseri,  &  Tannock, 
2009;  Decker,  Roberts,  &  Englund,  2013;  Lervag  &  Hulme,  2009; 
Narhi  et  al„  2005;  Savage,  Pillay,  &  Melidona,  2007).  For  exam¬ 
ple,  working  with  a  group  of  8-  to  11 -year-old  Finnish  children, 
Narhi  et  al.  (2005)  found  that  verbal  fluency,  nonword  reading, 
speed  of  processing,  and  motor  dexterity  significantly  predicted 
RAN  (latent  factor  with  loadings  from  alphanumeric  [digits,  let¬ 
ters]  and  nonalphanumeric  [colors,  objects]  tasks).  Phonological 
awareness,  STM,  visual  skills,  and  executive  functions  did  not.  In 
contrast,  in  a  cross-sectional  study  that  covered  ages  5  to  12, 
Decker  et  al.  (2013)  found  that  retrieval  fluency  was  the  only 
processing  skill  that  predicted  RAN  (operationalized  with  object 
naming)  in  every  age  group.  Speed  of  processing  predicted  RAN 
only  among  7-  and  1 1 -year-olds,  and  phonological  awareness  only 
in  the  group  of  9-year-olds.  Attention  and  STM  did  not  predict 
RAN  in  any  age  group.  To  summarize,  the  few  studies  on  this  topic 
have  shown  that  verbal  fluency  was  related  to  RAN  and  that  visual 
skills,  short-term  memory,  and  attention  were  not.  Nevertheless, 
there  was  little  agreement  between  studies  with  regard  to  the  role 
of  visual  skills,  speed  of  processing,  and  phonological  awareness. 
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Figure  1.  The  letter  naming  model  proposed  by  Wolf  and  Bowers  (1999).  PSR  =  processing  speed  require¬ 
ments.  From  “The  Double-Deficit  Hypothesis  for  the  Developmental  Dyslexias,”  by  M.  Wolf  and  P.  G.  Bowers, 
1999,  Journal  of  Educational  Psychology,  91,  p.  417.  Copyright  1999  by  American  Psychological  Association. 
Reprinted  with  permission. 
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OBJECT 


Figure  2.  The  object-naming  model  proposed  by  Georgiou  (2010).  From 
“PASS  Cognitive  Processes:  Can  They  Explain  the  RAN-Reading  Rela¬ 
tionship?”  by  G.  K.  Georgiou,  2010,  Psychological  Science  (Chinese),  33, 
p.  1292.  Copyright  2010  by  China  National  Knowledge  Infrastructure. 
Reprinted  with  permission. 


There  might  be  two  explanations  for  the  contradictory  findings: 
First,  whereas  Narhi  et  al.  (2005)  treated  their  sample  as  one  group, 
Decker  et  al.  (2013)  performed  their  analyses  separately  for  each 
age  group.  If  the  relationship  of  different  processing  skills  with 
RAN  varies  as  a  function  of  age,  then  merging  different  age  groups 
may  have  concealed  important  differences  in  the  role  of  these 
processes  in  RAN.  Second,  there  are  differences  between  studies 
in  the  number  of  processing  skills  included  in  the  regression 
analyses.  For  example,  Decker  et  al.  (2013)  involved  all  measures 
used  in  the  standardization  of  Woodcock-Johnson  Tests  of  Cog¬ 
nitive  Abilities — III  that  are  not  necessarily  related  to  any  of  the 
naming  models.  In  contrast,  Narhi  et  al.  (2005)  included  measures 
administered  as  part  of  a  larger  assessment  battery  for  learning 
disabilities.  Although  the  existing  studies  have  been  conducted  in 
different  languages  (e.g.,  Narhi  et  al.’s,  2005,  study  was  conducted 
in  Finnish;  Lervag  &  Hulme’s,  2009,  study  in  Norwegian;  and  the 
rest  of  the  studies  in  English),  this  difference  is  not  sufficient  to 
explain  the  contradictory  findings  because  similar  discrepancies 
can  be  seen  even  among  studies  conducted  within  the  same  lan¬ 
guage.  For  example,  visual  processing  was  not  a  significant  pre¬ 
dictor  of  RAN  in  Decker  et  al.’s  (2013)  study,  but  it  was  in  Amell 
et  al.’s  (2009)  study  in  which  undergraduate  students  were  tested 
on  all  four  RAN  tasks.  Both  studies  were  conducted  in  English.  In 
addition,  we  have  no  reason  to  believe  that  domain-general  pro¬ 
cesses  such  as  visual  processing,  motor  programming,  articulation, 
or  speed  of  processing  would  behave  differently  across  languages. 

A  common  characteristic  of  the  studies  that  examined  the  fac¬ 
tors  associated  with  RAN  performance  is  that  they  all  focused  on 
the  role  of  cognitive  processes.  However,  it  is  equally  possible  that 
some  of  the  variance  in  RAN  is  accounted  for  by  noncognitive 
(environmental)  factors.  The  possible  connection  between  envi¬ 
ronmental  factors  and  RAN  is  supported  by  two  pieces  of  evi¬ 
dence:  First,  behavioral- genetic  studies  have  shown  that  although 


RAN  is  highly  heritable,  a  significant  proportion  of  its  variance  is 
also  accounted  for  by  environment  (e.g.,  Byrne  et  al.,  2007;  Davis, 
Knopik,  Olson,  Wadsworth,  &  DeFries,  2001;  Samuelsson  et  al., 
2007). 1  Second,  a  few  studies  that  examined  the  role  of  home 
learning  environment  on  emergent  literacy  skills  have  reported 
significant  correlations  between  RAN  and  home  learning  environ¬ 
ment  (Manolitsis,  Georgiou,  &  Tziraki,  2012;  Niklas  &  Schneider, 
2013).  In  this  study,  we  examined  the  relationship  of  two  envi¬ 
ronmental  factors  (home  learning  environment  and  socioeconomic 
status)  with  RAN  performance. 

The  Present  Study 

The  purpose  of  this  study  was  to  examine  the  relationship  of 
both  cognitive  (attention,  visual  processing,  conceptual  process¬ 
ing,  semantic  processing,  phonological  processing,  STM,  articula¬ 
tion,  and  speed  of  processing)  and  environmental  (home  learning 
environment  and  socioeconomic  status)  factors  with  RAN.  Our 
study  addresses  three  important  gaps  in  the  literature:  First,  we  use 
existing  theoretical  models  of  RAN  (Georgiou,  2010;  Wolf  & 
Bowers,  1999)  to  guide  the  selection  of  the  processing  skills  that 
may  relate  to  RAN.  Second,  we  assessed  not  only  RAN  (when 
stimuli  are  presented  all  at  once)  but  also  discrete  naming  (when 
stimuli  are  presented  one  at  a  time).  This  helps  us  test  an  interest¬ 
ing  hypothesis.  Arguably,  with  the  exception  of  seriality  and 
articulation  (assuming  voice  onset  latencies  are  used),  discrete 
naming  encompasses  all  subprocesses  involved  in  RAN.  Thus, 
controlling  for  the  effects  of  discrete  naming  should  diminish  the 
relationship  of  these  subprocesses  with  RAN.  If  this  is  true,  then 
the  reason  why  RAN  correlates  more  strongly  with  reading  than 
discrete  RAN  could  be  attributed  to  factors  such  as  eye-movement 
control  that  are  inherent  to  processing  of  an  array  of  stimuli  rather 
than  to  discrete  naming. 

Finally,  our  study  was  conducted  in  Chinese.  Several  studies 
have  already  shown  that  RAN  is  a  strong  predictor  of  reading  in 
Chinese  (e.g.,  Chow,  McBride-Chang,  &  Burgess,  2005;  Liao  et 
al.,  2008,  2015;  McBride-Chang  &  Suk-Han  Ho,  2005;  McBride- 
Chang  &  Kail,  2002;  J.  Pan  et  al.,  2011;  Tan,  Spinks,  Eden, 
Perfetti,  &  Siok,  2005;  see  also  Song,  Georgiou,  Su,  &  Shu,  2016, 
for  a  meta-analysis).  Recently,  Georgiou,  Aro,  Liao,  and  Parrila 
(2016)  have  also  demonstrated  that  the  paths  from  RAN  to  reading 
fluency  in  Chinese  do  not  differ  from  those  in  alphabetic  orthog¬ 
raphies  like  English  and  Finnish.  However,  compared  to  alphabetic 
orthographies,  examining  the  relationship  of  different  subpro¬ 
cesses  with  RAN  in  Chinese  gives  us  an  additional  advantage. 
Specifically,  researchers  have  suggested  that  because  children  in 
North  America  and  Europe  are  not  all  familiar  with  the  names  of 
letters  and  digits  until  they  go  to  Grade  1 ,  alphanumeric  RAN  tasks 
should  not  be  administered  to  preschool  children  (Kirby  et  al., 
2010).  In  contrast,  in  China,  children  start  kindergarten  at  the  age 
of  3  and  recognizing/naming  Arabic  numerals  is  among  the  first 
things  they  learn.  Given  that  alphanumeric  RAN  has  been  found  to 
be  more  strongly  related  to  reading  (e.g.,  Araujo  et  al.,  2015;  Song 
et  al.,  2016)  and  that  RAN  is  more  strongly  related  with  reading  in 
the  early  grades  (e.g.,  Araujo  et  al.,  2015),  examining  RAN  in 


1  We  acknowledge  that  the  home  learning  environment  is  only  one 
aspect  of  environment. 


468 


LIU  AND  GEORGIOU 


Chinese  kindergarten  children  allows  us  to  assess  not  only  object 
naming  but  also  digit  naming. 

Method 

Participants 

Letters  of  information  describing  the  study  were  sent  to  parents 
of  all  162  second-year  kindergarten  children  in  three  inner-city 
kindergartens  in  Xi’an,  China.  One  hundred  forty-four  children 
with  parental  consent  were  subsequently  invited  to  participate  in 
our  study.  Three  children  who  did  not  assent  to  participate  were 
further  excluded  from  the  study,  thus  leaving  our  sample  with  141 
children  (71  girls  and  70  boys;  mean  age  =  58.99  months,  SD  = 
3.17;  range:  54-65  months).  Most  children  came  from  families  of 
middle  socioeconomic  background  (based  on  mother’s  occupation 
and  education)  and  none  was  diagnosed  with  any  intellectual, 
behavioral,  or  sensory  deficits.  All  children  were  native  Mandarin 
speakers  and  had  normal  or  corrected-to-normal  vision. 

The  kindergartens  that  participated  in  our  study  follow  the  same 
curriculum.  The  teachers  were  all  female  and  had  more  than  5 
years  of  teaching  experience.  According  to  the  Guidelines  for 
Learning  and  Development  for  Children  ages  3-6  (Ministry  of 
Education  of  the  People’s  Republic  of  China,  2012),  children  4  to 
5  years  old  should  demonstrate  active  interest  in  reading  books, 
enjoy  telling  stories  they  heard  or  read  with  others,  have  adequate 
listening  comprehension  skills,  and  generate  stories  based  on  pic¬ 
tures.  Importantly,  no  direct  instruction  on  reading  and  writing  is 
expected  during  this  period. 

Parents  also  participated  in  the  study  by  completing  a  question¬ 
naire  on  family  background  (mother’s  education  and  occupation) 
and  on  the  frequency  of  engaging  in  different  home  literacy  and 
numeracy  activities  with  their  child  (see  Measures  section).  The 
questionnaire  was  filled  out  by  1 12  mothers,  10  fathers,  and  9 
families  where  parents  responded  together.  The  families  of  10 
children  did  not  return  the  questionnaire. 

Measures 

Home  learning  environment  (HLE).  A  questionnaire  that 
consisted  of  two  parts  was  sent  home  to  be  filled  out  by  parents.  In 
part  A,  we  asked  for  mother’s  highest  attained  education  and 
current  occupation.2  Education  background  included  eight  levels 
(ranged  from  finished  elementary  school  to  finished  graduate 
studies).  Mother’s  occupation  was  scored  on  an  8-point  scale 
provided  by  the  Depaxtment  of  Human  Resources  and  Social 
Security  of  Shaanxi  Province  (2013).  To  assign  mother’s  occupa¬ 
tion  to  one  of  the  8  scale  points,  we  first  converted  the  reported 
occupation  to  a  salary  estimate  and  then  allocated  the  estimated 
annual  income  to  one  of  the  eight  salary  brackets  (1  =  no  salary 
to  8  =  100,000  RNB  or  more).  Based  on  Xi’an’s  Statistical 
Yearbook  for  2014  (Xi’an  Municipal  Bureau  of  Statistics,  2015), 
mother’s  salary  in  our  sample  was  representative  of  the  general 
population  in  Xi’an.  Socioeconomic  status  was  the  sum  of  moth¬ 
er’s  education  and  occupation  score  (maximum  =  16). 

In  Part  B,  parents  were  asked  to  indicate  the  frequency  of 
engaging  in  different  home  reading-  and  math-related  activities 
when  their  child  was  in  the  second  year  of  kindergarten  using  a 
5-point  Likert  scale  (0  =  never/less  than  5  min  a  day  to  4  =  every 


Table  1 

Descriptive  Statistics  on  the  Questions  Included  in  the 
Parents’  Questionnaire  


Question 

M 

SD 

Min 

Max 

1.  Socioeconomic  status  (max  =16) 

2.  How  often  did  you  teach  your  child  to 

10.50 

2.83 

3 

16 

recognize  numbers? 

3.  How  often  did  you  teach  your  child  to 
do  simple  calculations  (e.g.,  2  +  3, 

2.35 

.88 

0 

4 

5  -  1)? 

4.  How  often  did  you  teach  your  child  to 

2.29 

.93 

0 

4 

write  numbers? 

5.  How  many  hours  did  you  read  to  your 
child  on  a  typical  weeknight  (Monday  to 

2.17 

.88 

0 

4 

Friday)? 

6.  How  many  hours  did  you  read  to  your 
child  on  the  weekend  (Saturday  and 

1.37 

.79 

0 

4 

Sunday)? 

7.  How  much  time  did  you  (or  someone 
else  at  home)  spend  reading  a  story  to 

1.61 

.87 

0 

4 

your  child? 

8.  How  often  did  you  teach  your  phild 

1.35 

.90 

0 

3 

pinyin  or  letters? 

9.  How  often  did  you  teach  your  child  to 

1.17 

1.06 

0 

4 

read  Chinese  characters? 

10.  How  often  did  you  teach  your  child  to 

2.11 

1.03 

0 

4 

write  Chinese  characters? 

1.21 

1.04 

0 

4 

Note.  The  descriptives  are  based  on  131  questionnaires.  For  Questions  2 
to  4  and  8  to  10,  the  options  were  0  =  never,  1  =  less  than  once  a  month, 
2  =  a  few  times  a  month,  3  =  a  few  times  a  week,  and  4  =  daily.  For 
Questions  5  to  7,  the  options  were  0  =  none,  1  =  1-30  min,  2  =  31-60 
min,  3  =  1-2  hr,  and  4  =  more  than  2  hr.  Min  =  minimum;  max  = 
maximum. 


day/2  hr  or  more).  This  part  of  the  questionnaire  consisted  of  nine 
questions  (three  questions  on  direct  teaching  of  reading  (e.g., 
“How  often  did  you  teach  your  child  to  recognize  Chinese  char¬ 
acters?”),  three  questions  on  direct  teaching  of  math  (e.g.,  “How 
often  did  you  teach  your  child  to  do  simple  calculations?”),  and 
three  questions  on  shared  reading  (e.g.,  “How  often  did  you  read 
to  your  child  over  the  weekend?”).  The  questions  for  the  reading- 
related  activities  were  adapted  from  the  work  of  Senechal  (2006) 
and  the  questions  for  the  math-related  activities  were  adapted  from 
the  work  of  Lefevre,  Clarke,  and  Stringer  (2002).  The  descriptive 
statistics  on  these  questions  are  reported  in  Table  1. 

We  subsequently  examined  the  factor  structure  of  the  items 
included  in  part  B  of  the  questionnaire  by  performing  a  principal 
axis  factor  analysis  with  direct  oblimin  rotation.  In  line  with  the 
findings  of  previous  studies  (see,  e.g.,  Deng,  Silinskas,  Wei,  & 
Georgiou,  2015;  Manolitsis,  Georgiou,  Stephenson,  &  Parrila, 
2009;  Senechal,  2006),  the  results  showed  that  there  were  two 
factors  with  eigenvalues  greater  than  1  accounting  for  65%  of  the 
variance.  The  items  on  direct  teaching  (threading  and  mathematics 
loaded  on  one  factor  (loadings  ranged  from  .701  to  .883),  and  the 
items  on  shared  book  reading  loaded  on  a  second  factor  (loadings 


We  did  not  ask  for  father’s  education  and  occupation  for  two  reasons: 
First,  from  our  experience  most  questionnaires  are  filled  out  by  mothers, 
and  second,  fathers  (or  mothers  reporting  on  behalf  of  fathers)  in  China  are 
reluctant  to  reveal  information  about  father’s  occupation  (this  question  is 
perceived  as  being  too  personal). 
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ranged  from  .808  to  .888).  Following  Senechal’s  (2006)  protocol, 
we  called  the  first  factor  formal  home  learning  environment  and 
the  second  factor  informal  home  learning  environment.  The  two 
factors  correlated  .30  with  each  other. 

Nonverbal  IQ.  Nonverbal  matrices  from  the  Cognitive  As¬ 
sessment  System  (CAS)  battery  (Naglieri  &  Das,  1997)  were 
administered  to  assess  nonverbal  IQ.  Children  were  presented  with 
a  pattern  of  shapes/geometric  designs  that  was  missing  a  piece  and 
were  asked  to  choose  from  among  five  or  six  alternatives  the  piece 
that  would  accurately  complete  the  pattern.  A  discontinuation  rule 
of  four  consecutive  mistakes  was  applied.  A  participant’s  score 
was  the  total  number  correct  (maximum  =  33). 

Attention.  Expressive  Attention  from  the  CAS  battery  (Na¬ 
glieri  &  Das,  1997)  was  used  as  a  measure  of  attention.  Children 
were  shown  animals  that  were  either  small  (a  bird,  a  mouse,  a 
butterfly,  a  cat,  and  a  frog)  or  big  (an  elephant,  a  whale,  a  horse, 
a  bear,  and  a  dinosaur)  and  were  asked  to  name  them  as  fast  as 
possible  by  saying  “big”  or  “small”  depending  on  the  actual  size  of 
the  animals.  In  the  neutral  condition,  all  of  the  pictures  were  of  the 
same  physical  size  and  were  arranged  in  semirandom  order  in  five 
rows  of  eight;  in  the  congruent  condition,  the  size  of  the  pictures 
was  consistent  with  the  actual  size  of  the  animals;  and  in  the 
incongruent  condition,  the  pictures  of  the  animals  were  in  contrast 
to  their  actual  size  most  of  the  time.  For  each  condition,  children 
named  stimuli  from  a  practice  card  prior  to  timed  testing.  A 
participant’s  score  was  the  time  difference  between  the  incongru¬ 
ent  and  the  neutral  condition. 

Visual  processing.  Planned  Search  was  adopted  from  the 
work  of  Das,  Naglieri,  and  Kirby  (1994)  to  assess  visual  process¬ 
ing.  In  this  task,  the  children  were  asked  to  match  the  target  object 
or  digit  in  the  center  box  with  an  object  or  digit  in  the  visual  field 
as  quickly  as  they  could.  The  children  were  asked  to  point  to  the 
object  or  number  in  the  area  around  the  boxed  target  that  matched 
the  target.  Each  item  consisted  of  two  searches,  one  on  the  top  half 
and  one  on  the  bottom  half  of  the  page.  The  task  consisted  of  12 
items  of  increasing  difficulty.  Four  items  required  children  to 
match  the  target  object  among  digits;  four  items  required  children 
to  match  the  target  digit  among  digits;  and  four  items  required 
children  to  match  the  target  object  among  objects.  Each  target  had 
only  one  match  on  a  page.  Timing  began  when  each  item  was 
exposed  to  the  children  and  stopped  when  children  completed  both 
searches  on  a  page.  A  participant’s  score  was  the  average  time  to 
complete  all  12  items. 

Conceptual  processing.  A  categorization  task  was  developed 
by  the  authors  in  which  children  were  presented  with  a  card  that 
contained  five  objects  (monkey,  cat,  elephant,  strawberry,  and 
banana)  repeated  seven  times  each  and  arranged  in  semirandom 
order  in  five  rows  of  seven.  Children  were  asked  to  say  “yes”  as 
fast  as  possible  to  pictures  of  animals  and  “no”  to  pictures  of  fruits. 
A  practice  item  preceded  each  test  item.  A  participant’s  score  was 
the  average  time  needed  to  name  all  35  pictures  across  two  trials. 
The  average  number  of  errors  was  negligible  (less  than  1 )  and  for 
this  reason  it  was  not  considered  further. 

Semantic  processing.  Semantic  processing  was  assessed  with 
two  tasks:  The  neutral  condition  of  Expressive  Attention  (see 
above)  and  the  Digit  Big/Small  task.  In  the  Digit  Big/Small  task, 
children  viewed  a  card  with  five  rows  of  seven  Arabic  numerals 
and  were  asked  to  say,  as  fast  as  possible,  “small”  each  time  they 
came  across  numbers  1,  2,  and  3,  and  “big”  every  time  they  came 


across  numbers  7  and  9.  A  participant’s  score  was  the  average  time 
needed  to  name  all  35  numbers  across  two  trials.  A  practice  item 
preceded  each  test  item.  The  average  number  of  errors  was  neg¬ 
ligible  (less  than  1)  and  for  this  reason  it  was  not  considered 
further.  To  calculate  a  score  for  semantic  processing  we  averaged 
the  z  scores  of  the  two  tasks. 

STM.  The  Chinese  version  of  the  Forward  Digit  Span  subtest 
(Wechsler,  1974)  was  used  to  assess  children’s  short-term  mem¬ 
ory.  Children  were  asked  to  repeat  a  sequence  of  digits  in  the  same 
order  they  had  heard  them  from  the  experimenter.  Testing  started 
with  a  series  of  two  digits  and  ended  with  nine  digits;  each 
difficulty  level  contained  two  items.  The  task  was  discontinued 
after  children  got  both  items  of  the  same  difficulty  level  wrong.  A 
participant’s  score  was  the  total  number  of  correctly  recalled  trials 
(maximum  =  16). 

Phonological  processing.  A  syllable  deletion  task  was  used  to 
assess  phonological  processing.  It  consisted  of  4  practice  trials  and 
20  experimental  trials.  Children  were  asked  to  say  what  was  left  in 
a  Chinese  word  after  deleting  one  of  the  syllables  in  a  word  (e.g., 
Say/qi4  chel  zhan4/(meaning  bus  station).  Now  say/qi4  chel 
zhan4/without/zhan4/would  be/qi4  chel/(meaning  bus)).  To  in¬ 
crease  the  task  difficulty,  five  nonwords  that  conformed  to  the 
phonological  constraints  of  Chinese  but  do  not  exist  in  modem 
Mandarin  were  added  to  the  15  real  words.  The  task  consisted  of 
8  two-syllable  items  and  12  three-syllable  items.  Half  of  the 
two-syllable  items  required  deleting  the  first  syllable  and  the  other 
half  the  last  syllable,  while  in  the  three-syllable  items,  one  third  of 
the  items  required  deleting  the  first,  one  third  the  middle,  and  one 
third  the  final  syllable,  respectively.  One  point  was  awarded  for 
each  correct  answer  and  a  discontinuation  rule  of  five  consecutive 
errors  was  applied. 

Articulation.  Speech  Rate  from  the  CAS  battery  (Naglieri  & 
Das,  1997)  was  used  to  assess  articulation.  Children  were  asked  to 
repeat  monosyllabic  triplets  of  Chinese  words  (e.g., 
meaning  duck-dog-book)  10  times  as  fast  as  possible.  The  task 
consisted  of  five  items  and  the  participant’s  score  was  the  average 
time  across  the  five  items. 

Speed  of  processing.  A  choice  reaction  time  (RT)  task  de¬ 
signed  in  E-Prime  Professional  2.0  program  was  used  to  assess 
children’s  speed  of  processing.  Two  black  circles  (one  to  the  left 
and  one  to  the  right)  were  presented  on  a  white  background  on  a 
laptop  computer  with  variable  interstimulus  intervals  (ISIs;  100 
ms,  150  ms,  200  ms,  250  ms,  and  300  ms)  to  ensure  participants 
did  not  get  used  to  a  rhythmic  order.  Each  circle  subtended  2.86  of 
visual  angle  at  a  viewing  distance  of  50  cm.  The  response  of  each 
ISI  was  presented  three  on  the  left  and  three  on  the  right  randomly 
during  30  experimental  trials  preceded  by  a  block  of  eight  practice 
trials.  Children  were  asked  to  press  “z”  button  if  the  left  black 
circle  appeared  first  and  the  “m”  button  if  the  right  black  circle 
appeared  first.  The  average  latency  was  calculated  across  all  30 
presentations  and  was  used  as  the  participant’s  score. 

RAN.  Children  were  asked  to  name  as  quickly  as  possible 
recurring  Arabic  numerals  (6,  2,  4,  9,  7)  and  objects  (key,  butterfly, 
cake,  schoolbag,  banana)  that  were  arranged  semirandomly  in 
five  rows  of  seven.  For  both  tasks,  children  were  asked  to  name  the 
array  of  digits  and  objects  twice  and  a  participant’s  score  was  the 
average  time  to  name  all  stimuli  across  the  two  trials.  The  average 
number  of  errors  in  both  RAN  tasks  was  negligible  (less  than  1) 
and  for  this  reason  it  was  not  considered  further. 
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Discrete  naming.  The  five  Arabic  numerals  and  objects  used 
in  RAN  were  used  to  assess  discrete  naming.  Each  stimulus  was 
presented  individually  in  the  center  of  a  computer  screen  with  a 
brief  (800  ms)  blank  screen  in  between  presentations.  Stimuli  were 
presented  in  random  order  and  each  stimulus  was  presented  seven 
times  for  a  total  of  35  trials  for  digits  and  35  trials  for  objects.  The 
time  between  the  presentation  of  each  stimulus  and  the  onset  of  the 
vocal  response  was  measured  with  E-Prime  Professional  2.0  pro¬ 
gram  on  a  computer  equipped  with  Serial  Response  Box,  and  a 
head-mounted  microphone  was  used  to  record  the  voice-onset  RT 
across  all  35  presentations.  Participants  were  given  a  practice 
session  with  five  items  before  the  actual  testing.  A  participant’s 
score  was  the  average  voice-onset  RT  across  the  35  stimuli  for 
digits  and  objects.  Naming  errors  (1.8%  of  the  response  times  in 
digits  and  1.2%  of  the  response  times  in  objects)  and  RTs  less  than 
100  ms  (4.6%  of  the  response  times  in  digits  and  6.3%  of  the 
response  times  in  objects)  were  excluded  from  the  calculation  of  a 
participant’s  score. 

Procedure 

All  children  were  individually  assessed  in  a  quiet  room  at  school 
during  school  hours  by  trained  graduate  students.  Testing  was 
completed  in  three  sessions  of  20  min  each  in  April/May.  In 
Session  A,  measures  of  nonverbal  IQ,  conceptual  processing, 
STM,  and  phonological  processing  were  administered.  In  Session 
B,  measures  of  attention,  semantic  processing,  and  RAN  were 
administered.  Finally,  in  Session  C,  children  were  assessed  on 
measures  of  visual  processing,  articulation,  speed  of  processing, 
and  discrete  naming.  The  order  of  the  tasks  within  each  session 
was  fixed  and  there  was  a  short  rest  in  between  each  task.  Both 
scoring  and  data  entry  were  double-checked  by  Cuina  Liu,  and  no 
errors  were  found.  There  were  no  missing  data,  and  the  analyses 
were  performed  with  a  full  dataset. 

Results 

Preliminary  Data  Analyses 

Table  2  presents  the  descriptive  statistics  for  the  cognitive 
measures  used  in  our  study.  Before  conducting  any  analyses,  we 
examined  the  distributional  properties  of  the  measures.  The  mea¬ 
sures  used  to  operationalize  attention,  conceptual  processing  se¬ 
mantic  processing,  speed  of  processing,  RAN  (digits  and  objects), 
and  discrete  naming  (digits  and  objects)  were  positively  skewed. 
To  normalize  their  distributions,  we  log-transformed  their  scores 
and  the  transformed  scores  were  used  in  subsequent  analyses. 

Table  3  presents  the  correlations  between  the  variables  used  in 
our  study.  RAN  digits  and  objects  correlated  significantly  with  the 
other  variables  except  from  socioeconomic  status  (SES),  HLE 
informal,  and  STM.  Not  surprisingly,  the  highest  correlations  were 
with  discrete  naming  (for  digits:  r  =  .67;  for  objects:  r  =  .56),  but 
moderate  correlations  were  also  obtained  with  phonological  pro¬ 
cessing,  conceptual  processing,  semantic  processing,  and  articula¬ 
tion. 

Regression  Analyses 

To  examine  the  predictors  of  RAN  digits  and  objects,  we  ran 
hierarchical  regression  analyses.  To  simplify  the  analyses,  SES, 


Table  2 

Descriptive  Statistics  for  the  Measures  Used  in  the  Study 


Measure 

M 

SD 

Min 

Max 

Split-half 

reliability 

Nonverbal  IQ 

Nonverbal  Matrices 

8.06 

2.75 

0 

18 

.84 

Attention 

Expressive  Attention 

29.64 

12.07 

.62 

64.35 

.80 

Visual  processing 

Planned  Search 

16.59 

6.21 

5.88 

34.83 

.86 

Conceptual  processing 
Categorization 

38.88 

8.49 

24.04 

74.45 

.90 

Semantic  processing 

Animal 

60.36 

12.74 

32.12 

120.62 

.92 

Digit  big/small 

51.23 

11.72 

28.00 

99.08 

.85 

Short-term  memory 

Digit  Span  Forward 

10.45 

2.29 

6 

16 

.80 

Phonological  processing 
Syllable  deletion 

9.50 

4.27 

0 

19 

.91 

Articulation 

Speech  Rate 

10.77 

2.07 

7.27 

17.59 

.90 

Speed  of  processing 

Choice  Reaction  Time 

1.03  ' 

.45 

.38 

3.15 

.82 

RAN 

Digits 

41.68 

15.16 

21.14 

118.19 

.90 

Objects 

54.63 

15.04 

32.47 

160.89 

.82 

Discrete  naming 

Digits 

1.02 

.52 

.53 

4.78 

.87 

Objects 

1.13 

.46 

.62 

3.72 

.87 

Note.  All  response  times 

are  in 

seconds. 

Min  = 

minimum;  max  = 

maximum;  RAN  =  rapid  automatized  naming. 


STM,  and  HLE  informal  were  left  out  from  the  analyses,  because 
they  did  not  correlate  significantly  with  either  RAN  task.  In  Model 
1,  we  entered  Age  and  Nonverbal  IQ  at  Step  1  of  the  regression 
equation  and  HLE  formal,  attention,  visual  processing,  conceptual 
processing,  semantic  processing,  phonological  processing,  articu¬ 
lation,  and  speed  of  processing  at  Step  2  of  the  regression  equation. 
A  second  set  of  analyses  was  performed  with  discrete  naming 
entered  at  Step  2  of  the  regression  equation  and  all  other  variables 
at  Step  3  (see  Model  2).  Significance  levels,  standardized  beta 
coefficients,  and  R2  changes  are  presented  in  Table  4. 

In  Model  1 ,  the  results  showed  that  after  controlling  for  Age  and 
Nonverbal  IQ,  HLE  formal,  visual  processing,  phonological  pro¬ 
cessing,  and  articulation  predicted  significantly  both  RAN  digits 
and  RAN  objects.  Semantic  processing  also  predicted  RAN  ob¬ 
jects.  Attention,  conceptual  processing,  and  speed  of  processing 
did  not  predict  either  RAN  task.  Controlling  for  the  effects  of 
discrete  naming  reduced  the  effects  of  most  subprocesses  to  non¬ 
significant  levels  (see  Model  2).  However,  articulation  continued 
to  predict  both  RAN  tasks  and  visual  processing  continued  to 
predict  RAN  objects. 

t 

Discussion 

The  purpose  of  this  study  was  to  examine  the  cognitive  and 
environmental  correlates  of  RAN.  To  select  the  cognitive  pro¬ 
cesses,  we  relied  on  Wolf  and  Bowers’  (1999)  letter-naming  model 
and  on  Georgiou’s  (2010)  object-naming  model.  In  line  with  both 
naming  models,  our  findings  showed  that  RAN  is  multicomponen- 
tial.  Phonological  processing,  visual  processing,  and  articulation 
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Table  3 

Correlations  Between  the  Measures  in  Our  Study 


Measure  1 

2 

3  4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

1.  Age  (months) 

-.23 

1 

b 

1 

b 

o 

.19 

.25 

-.22 

-.28 

-.33 

-.02 

.31 

-.10 

-.22 

-.19 

-.22 

-.34 

-.30 

2.  SES 

.15  .28 

.03 

.01 

-.05 

.11 

.07 

.24 

.23 

-.15 

.08 

-.15 

-.09 

-.07 

.01 

3.  HLE  formal 

.30 

-.11 

-.02 

-.09 

-.06 

.00 

.1 1 

.01 

-.06 

.01 

-.21 

-.19 

-.20 

-.06 

4.  HLE  informal 

.11 

-.04 

-.08 

.07 

-.08 

.1 1 

.10 

-.05 

.26 

-.10 

-.03 

.03 

.13 

5.  Nonverbal  IQ 

-.07 

-.23 

.16 

-.22 

.06 

.28 

-.15 

-.02 

-.22 

-.17 

-.24 

-.21 

6.  Attention 

.20 

.44 

.61 

.03 

-.17 

.16 

.44 

.29 

.33 

.28 

.30 

7.  Visual  processing 

.27 

.26 

.07 

-.22 

.04 

.07 

.31 

.32 

.37 

.28 

8.  Conceptual  processing 

.68 

.13 

-.21 

.33 

.34 

.38 

.38 

.46 

.46 

9.  Semantic  processing 

.06 

-.31 

.29 

.41 

.43 

.45 

.43 

.44 

10.  Short-term  memory 

.36 

-.04 

-.03 

-.07 

-.05 

.04 

.12 

11.  Phonological  processing 

-.37 

-.21 

-.40 

-.39 

-.35 

-.26 

12.  Articulation 

.14 

.39 

.39 

.29 

.21 

13.  Speed  of  processing 

.27 

.25 

.36 

.42 

14.  RAN  digits 

.71 

.67 

.47 

15.  RAN  objects 

.50 

.56 

16.  Discrete  digit  naming  .78 

17.  Discrete  object  naming 

Note.  Higher  scores  in  accuracy  measures  (better  performance)  should  be  related  to  lower  scores  in  reaction  time  measures  (better  performance);  thus, 
correlations  should  be  negative.  Correlations  lower  than  .17  are  nonsignificant;  correlations  between  .17  and  .22  are  significant  at  the  .05  level;  and 
correlations  higher  than  .22  are  significant  at  the  .01  level.  SES  =  socioeconomic  status;  HLE  =  home  learning  environment;  RAN  =  rapid  automatized 
naming. 


were  unique  correlates  of  both  RAN  tasks  (see  Model  1).  The 
finding  regarding  phonological  processing  is  interesting  because 
previous  studies  reported  nonsignificant  effects  of  phonological 
awareness  (when  considered  along  with  other  predictors)  on  RAN 
(see  Decker  et  al.,  2013;  Narhi  et  ah,  2005). 3  This  discrepancy  is 
likely  due  to  the  overlap  between  phonological  awareness  and 

Table  4 


Results  of  Hierarchical  Regression  Analyses  Predicting  RAN 


RAN  digits 

RAN  objects 

Step 

Variable 

P 

A  R2 

P 

A  R2 

Model  1 

1 

Age 

-.171* 

.08** 

-.226** 

.08** 

Nonverbal  IQ 

-.190* 

-.120 

2 

HLE  formal 

-.178* 

.32*** 

-.155* 

Attention 

-.065 

-.028 

Visual  processing 

.198* 

.200** 

Conceptual  processing 

.067 

.026 

Semantic  processing 

.206 

.242* 

Phonological  processing 

-.156* 

-.157* 

Articulation 

.250** 

.282*** 

Speed  of  processing 

.062 

.025 

Model  2 

2 

Discrete  naming 

.653*** 

.36*** 

492*** 

.22*** 

3 

HLE  formal 

-.088 

.09** 

-.124 

.18*** 

Attention 

-.048 

-.002 

Visual  processing 

.080 

.157* 

Conceptual  processing 

-.041 

-.025 

Semantic  processing 

.165 

.175 

Phonological  processing 

-.128 

-.143 

Articulation 

191** 

.260*** 

Speed  of  processing 

-.023 

-.055 

Note.  HLE  =  home  learning  environment;  RAN  =  rapid  automatized 
naming. 

’><.05.  *><.01.  **><.001. 


nonword  reading,  which  was  also  used  as  a  predictor  of  RAN  in 
the  aforementioned  studies.  An  alternative  explanation  may  relate 
to  the  age  of  the  participants  in  our  study.  Given  that  the  relation¬ 
ship  between  phonological  awareness  and  RAN  is  stronger  in  early 
phases  of  reading  development  and  declines  thereafter  (see  Geor- 
giou,  Parrila,  Kirby,  &  Stephenson,  2008;  Parrila  et  ah,  2004),  it 
should  not  surprise  us  that  phonological  awareness  was  uniquely 
associated  with  RAN  in  our  study  with  kindergarten  children,  but 
did  not  in  Decker  et  ah’s  (2013)  study  or  Narhi  et  ah’s 
(2005)study,  both  of  which  used  a  sample  of  older  children. 

Our  finding  reinforces  the  argument  put  forward  by  Wolf  and 
Bowers  (1999)  that  the  ability  to  access  and  retrieve  phonological 
representations  from  long-term  memory  is  only  one  part  of  the 
equation  and  other  processes  account  for  individual  differences  in 
RAN  as  well.  This  is  also  supported  by  the  results  of  the  regression 
analyses  in  which  we  controlled  for  discrete  naming  (see  Model  2 
in  Table  4).  If  discrete  naming  requires  as  much  access  to  phono¬ 
logical  representations  as  RAN,  then  controlling  for  the  effects  of 
discrete  naming  should  minimize  the  effects  of  phonological  pro¬ 
cessing  on  RAN  (exactly  what  we  found  here). 

Visual  processing  and  articulation  were  also  uniquely  related  to 
RAN.  In  Wolf  and  Bowers’  (1999)  letter-naming  model,  the  visual 
components  responsible  for  lower  spatial  frequencies  provide  in¬ 
formation  about  the  global  shape  of  the  stimulus  and  the  visual 
components  responsible  for  higher  spatial  frequencies  provide 
information  about  the  finer  details  of  the  stimulus.  This  step  allows 
individuals  to  differentiate  one  stimulus  (e.g.,  b)  from  similarly 
looking  stimuli  (e.g.,  d,  p).  The  importance  of  visual  processing 
has  also  been  documented  by  Stainthorp,  Stuart,  Powell,  Quinlan, 
and  Garwood  (2010)  who  showed  significant  visual  processing 


3  This  does  not  mean  that  phonological  awareness  is  not  significantly 
related  to  RAN.  In  fact,  in  their  meta-analysis,  Swanson,  Trainin,  Ne- 
coechea,  and  Hammill  (2003)  reported  a  correlation  of  .28  between  the 
two. 
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deficits  in  children  with  slow  RAN.  The  fact  that  articulation  also 
accounted  for  individual  differences  in  RAN  should  not  surprise  us 
either.  This  is  in  line  with  the  finding  of  previous  studies  showing 
that  RAN  is  a  stronger  predictor  of  oral  reading  fluency  than  silent 
reading  fluency  (e.g.,  Georgiou,  Parrila,  Cui,  &  Papadopoulos, 
2013;  van  den  Boer,  van  Bergen,  &  de  Jong,  2014)  and  with  the 
findings  of  functional  MRI  studies  showing  that  the  RAN  network 
involves  regions  associated  with  articulation  (Cummine,  Szepes- 
vari,  Chouinard,  Hanif,  &  Georgiou,  2014)  and  significant  corre¬ 
lations  between  the  neural  RAN  and  reading  parameters  are  de¬ 
tected  in  motor/articulatory  regions  (Cummine,  Chouinard, 
Szepesvari,  &  Georgiou,  2015).  The  reason  why  articulation  con¬ 
tinues  to  predict  RAN  after  controlling  for  discrete  naming  is 
because  the  score  in  discrete  naming  does  not  include  articulation 
(it  is  the  voice  onset  latency).  Had  we  included  articulation  time  in 
discrete  naming,  the  effects  of  articulation  on  RAN  would  likely 
become  nonsignificant  as  well. 

Our  results  further  showed  that  semantic  processing  was  a 
significant  correlate  of  RAN  objects.  Several  researchers  have 
shown  that  semantic  processing  is  an  integral  component  of  object 
naming  (e.g.,  Alario,  Ferrand,  Laganaro,  New,  Frauenfelder,  & 
Segui,  2004;  Cummine  et  al.,  2014;  Johnson,  Paivio,  &  Clark, 
1996).  Given  that  semantic  processing  contributes  to  RAN  objects, 
this  may  explain  (a)  why  RAN  objects  takes  longer  to  complete 
than  RAN  digits  (because  it  involves  an  extra  step  not  present  in 
RAN  digits;  Georgiou  et  al.,  2013;  Wolf,  Bally,  &  Morris,  1986), 4 
and  (b)  why  RAN  objects  are  usually  a  stronger  correlate  of 
reading  comprehension  than  of  word  reading  (e.g.,  Kirby  et  al., 
2010). 

Attention,  conceptual  processing,  STM,  and  speed  of  processing 
were  not  unique  correlates  of  RAN.  This  was  somewhat  expected 
based  on  the  findings  of  previous  studies  showing  that  RAN  is 
only  weakly  related  to  measures  of  attention  (e.g.,  Savage  et  al., 
2007;  Stringer,  Toplak,  &  Stanovich,  2004;  van  der  Sluis,  de  Jong, 
&  van  der  Leij,  2007),  conceptual  processing  (e.g.,  Decker  et  al., 
2013),  STM  (e.g.,  Narhi  et  al.,  2005;  Parrila  et  al.,  2004),  and 
speed  of  processing  (e.g.,  Decker  et  al.,  2013;  Liao  et  al.,  2015).  It 
is  possible  that,  being  domain-general  processes,  their  role  in  RAN 
is  to  facilitate  the  functioning  of  more  critical  subprocesses  (e.g., 
visual  processing,  phonological  processing).  For  example,  if  atten¬ 
tion  is  not  functioning  under  optimal  conditions,  this  may  impede 
the  successful  deployment  of  visual  processing  skills  and  will  slow 
down  naming  (e.g.,  Arnett  et  al.,  2012;  Semrud-Clikeman,  Guy, 
Griffin,  &  Hynd,  2000).  Likewise,  information  within  and  between 
subprocesses  must  be  processed  and  integrated  within  a  reasonable 
time  framework  (see  also  the  “synchronicity”  hypothesis  by  Br- 
eznitz,  2005).  As  long  as  speed  of  processing  operates  at  an 
optimal  level,  the  flow  of  information  from  one  subprocess  to  the 
next  will  happen  smoothly  and  the  naming  will  be  unimpaired. 
However,  an  alternative  explanation  for  the  nonsignificant  associ¬ 
ation  of  these  processing  skills  with  RAN  may  relate  to  the 
measures  used  to  operationalize  the  constructs.  For  example,  we 
administered  Expressive  Attention  (a  measure  of  inhibition)  to 
operationalize  attention.  Although  inhibition  is  an  aspect  of  atten¬ 
tion,  it  is  possible  that  it  is  not  that  aspect  of  attention  that  is 
responsible  for  its  relationship  with  RAN.  Notice  also  that  our 
measures  of  conceptual  processing  and  speed  of  processing  in¬ 
volved  inhibition,  which  may  have  reduced  their  chances  to 
emerge  as  significant  correlates  of  RAN.  Future  studies  should  try 


to  replicate  our  findings  covering  all  possible  aspects  of  the  in¬ 
volved  constructs  and  reducing  as  much  as  possible  the  overlap 
between  the  involved  constructs. 

The  speed  requirement  was  not  a  universal  feature  of  the  sub¬ 
processes  that  were  connected  with  RAN  (i.e.,  phonological  pro¬ 
cessing  was  related  to  RAN  without  any  time  constrains  and  many 
of  the  other  measures  with  speed  requirements  were  not).  This 
supports  the  notion  that  RAN  is  not  a  measure  of  speed  of  pro¬ 
cessing  (see  also  Bowey,  Storey,  &  Ferguson,  2004;  Van  Den  Bos, 
Zijlstra,  &  van  den  Broeck,  2003,  for  evidence  from  factor  analytic 
approaches)  and  should  not  be  used  as  such.  For  example,  several 
researchers  in  the  area  of  mathematics  continue  to  use  RAN  tasks 
as  measures  of  speed  of  processing  (e.g.,  Berg,  2008;  Chan  &  Ho, 
2010;  Vanbinst,  Ghesquiere,  &  De  Smedt,  2015;  Vukovic  & 
Siegel,  2010).  The  connection  between  RAN  and  visual  processing 
(which  seems  to  have  a  component  of  processing  speed  in  it) 
supports  the  argument  put  forward  by  Vaessen,  Gerretsen,  and 
Blomert  (2009)  that  in  addition  to  phonological  processing  re¬ 
quirements,  RAN  tasks  require  the  fast  cross-modal  matching  of 
visual/orthographic  units  to  phonological  codes. 

Interestingly,  formal  home  learning  experiences  accounted  for 
unique  variance  in  RAN  over  and  above  the  effects  of  cognitive 
processes  (see  Model  1  in  Table  4).  Based  on  the  home  literacy 
model  (Senechal,  2006),  formal  literacy  activities  predict  reading 
through  the  effects  of  letter  knowledge  and  phonological  aware¬ 
ness.  We  suggest  here  an  alternative  path  through  the  effects  of 
RAN.  If  RAN  is  an  emergent  literacy  skill  (Kim  &  Petscher,  201 1) 
that  predicts  reading  (Kirby  et  al.,  2010)  and  formal  home  learning 
activities  predict  both  RAN  (this  study;  see  also  Manolitsis  et  al., 
2012)  and  reading  (e.g.,  Manolitsis  et  al.,  2009;  Senechal,  2006), 
then  models  of  home  learning  environment  need  to  expand  in  order 
to  incorporate  the  mediating  role  of  RAN.  Nevertheless,  given  that 
Chinese  parents  engage  more  frequently  than  North  American  or 
European  parents  in  their  children’s  learning  (e.g.,  Cheung  & 
Pomerantz,  2011;  Deng  et  al.,  2015;  Y.  Pan,  Gauvain,  Liu,  & 
Cheng,  2006),  it  is  possible  that  the  effects  of  formal  home 
learning  activities  on  RAN  do  not  generalize  to  other  cultures. 
Certainly,  this  finding  needs  to  be  replicated  in  a  future  study.  In 
contrast  to  formal  home  learning  experiences,  SES  did  not  corre¬ 
late  significantly  with  RAN.  SES  correlated  significantly  with 
informal  home  learning  experiences  (more  educated  parents  tend 
to  bring  home  more  books  and  read  more  often  to  their  children 
than  less  educated  parents),  but  informal  home  learning  experi¬ 
ences  did  not  predict  RAN  either. 

Our  findings  have  important  theoretical  and  practical  implica¬ 
tions.  In  terms  of  theory,  we  have  shown  that  not  all  subprocesses 
are  unique  correlates  of  RAN  in  this  sample  of  Chinese  kinder¬ 
garten  children.  Although  we  cannot  exclude  the  possibility  that 
the  nonsignificant  effect  of  attention,  conceptual  processing,  STM, 

-  t 

4  An  alternative  explanation  may  be  that  the  names  of  objects  are  longer 
than  the  names  of  digits,  and  this  may  result  in  longer  times  (this  is  true 
also  in  our  study).  However,  a  recent  study  by  Altani  et  al.  (2016)  has 
shown  that  even  when  the  stimuli  across  tasks  are  matched  in  terms  of 
length,  RAN  objects  continue  to  produce  longer  times.  A  more  plausible 
explanation  relates  to  the  fact  that  digits  and  letters  represent  closed  sets, 
which  are  highly  frequent  and  familiar.  Because  of  the  greater  variety  of 
objects,  they  will  always  differ  from  the  “overlearned”  letters  and  digits. 
We  thank  an  anonymous  reviewer  for  this  explanation. 
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and  speed  of  processing  is  due  to  their  overlap  with  the  subpro¬ 
cesses  that  survived  as  predictors  of  RAN,  our  findings  do  show 
that  some  subprocesses  (i.e.,  visual  processing,  phonological  pro¬ 
cessing,  and  articulation)  are  more  important  to  RAN  than  others. 
Interestingly,  these  are  the  same  subprocesses  identified  by  some 
researchers  to  underlie  the  RAN-reading  relationship.  For  exam¬ 
ple,  Georgiou  et  al.  (2013)  have  argued  that  RAN  and  reading  are 
related  because  both  require  serial  processing  and  oral  production 
of  the  names  of  the  stimuli.  Furthermore,  the  finding  that  the 
subprocesses  of  serial  RAN  and  discrete  naming  are  practically  the 
same  while  the  connection  between  RAN  and  reading  is  stronger 
than  the  connection  between  discrete  naming  and  reading  (e.g., 
Georgiou  et  al.,  2013;  Logan  &  Schatschneider,  2014)  suggests 
that  the  format  of  the  tasks  is  a  significant  parameter  of  the 
RAN-reading  relationship.  Specifically,  when  stimuli  are  pre¬ 
sented  all  at  once,  this  allows  children  to  process  subsequent  items 
while  articulating  the  current  item  (Protopapas  et  al.,  2013).  In  line 
with  this  argument,  recent  eye-movement  studies  have  shown  that 
dyslexic  children  extract  less  parafoveal  information  when  naming 
stimuli  than  normal  readers  (e.g.,  J.  Pan,  Yan,  Laubrock,  Shu,  & 
Kliegl,  2013;  Yan,  Pan,  Laubrock,  Kliegl,  &  Shu,  2013).  From  a 
practical  point  of  view,  our  findings  suggest  that  if  we  were  to 
improve  RAN  performance,  we  need  to  adopt  a  multicomponential 
approach  that  focuses  on  the  most  important  subprocesses  in¬ 
volved  in  RAN  (i.e.,  visual  processing,  phonological  processing, 
and  articulation).  The  findings  of  some  recent  studies  are  promis¬ 
ing  (e.g.,  Pecini  et  al.,  in  press;  Wolff,  2014),  but  more  work  is 
needed  in  this  direction. 

Some  limitations  of  our  study  are  worth  mentioning:  First,  we 
assessed  the  cognitive  processes  using  single  measures.  This 
may  result  in  an  underestimation  of  the  aspects  of  each  con¬ 
struct.  Unfortunately,  given  the  amount  of  time  we  were  given 
for  testing,  we  could  not  assess  children  on  more  tests.  Second, 
information  on  HLE  was  collected  by  sending  out  a  self-report 
questionnaire  to  the  parents.  The  request  to  indicate  frequency 
of  reading  at  home  and  frequency  of  teaching  character  recog¬ 
nition  and  writing  is  subject  to  a  social  desirability  bias,  if 
parents  attach  a  high  value  to  these  aspects  of  HLE.  Third,  our 
sample  included  Chinese  kindergartners  who  had  received  some 
formal  instruction  on  mathematical  skills  such  as  understanding 
number  line  and  number  identification.  This  has  implications  on 
how  we  interpret  the  results  with  RAN  digits  as  well  as  the  role 
of  semantic  processing  (e.g.,  saying  “big”  to  numbers  7  and  9, 
and  “small”  to  numbers,  1,  2,  and  3)  for  which  children  had  to 
have  some  knowledge  of  the  approximate  location  of  a  given 
number  in  number  line.  Fourth,  the  regression  approach  fol¬ 
lowed  in  our  analyses  only  reveals  which  variables  have  a 
simple,  linear  relationship  with  RAN.  Other  variables  may  have 
a  nonlinear  relationship,  or  a  significant  interaction.  Finally,  our 
study  was  correlational  in  nature  and  any  significant  effects  do 
not  imply  causation. 

In  conclusion,  our  findings  add  to  those  of  previous  studies 
examining  the  cognitive  correlates  of  RAN  by  showing  that  among 
all  the  subprocesses  implicated  in  Wolf  and  Bowers’  (1999)  and 
Georgiou’ s  (2010)  naming  models,  only  visual  processing,  pho¬ 
nological  processing,  semantic  processing  (in  the  case  of  RAN 
objects),  and  articulation  were  unique  correlates  of  RAN.  Future 
studies  may  examine  whether  our  findings  generalize  to  all  RAN 


tasks,  at  different  points  in  time,  and  in  languages  with  different 
orthographic  characteristics. 
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This  longitudinal  study  examines  the  relative  importance  of  counting  ability,  additive  reasoning,  and 
working  memory  in  children’s  mathematical  achievement  (calculation  and  story  problem  solving).  In 
Hong  Kong,  1 15  Chinese  children  aged  6  years  old  participated  in  2  waves  of  assessments  (T1  =  first 
grade  and  T2  =  second  grade).  Multiple  regression  analyses  showed  that  counting  ability  explained  a 
significant  amount  of  variance  in  T1  and  T2  calculation  beyond  the  effects  of  age,  IQ,  and  working 
memory,  in  which  conceptual  knowledge  of  counting,  but  not  procedural  counting,  was  a  unique 
predictor.  However,  counting  ability  did  not  contribute  significantly  to  story  problem  solving  at  both  time 
points.  Additive  reasoning  explained  a  substantial  and  significant  amount  of  variance  in  calculation  and 
story  problem  solving  at  both  time  points  after  the  effects  of  age,  IQ,  working  memory,  and  counting 
ability  were  controlled  for:  Both  knowledge  of  the  commutativity  and  complement  principles  were 
unique  predictors.  Working  memory  also  accounted  for  a  significant  amount  of  variance  in  calculation 
and  story  problem  solving  at  both  time  points  beyond  the  influence  of  age,  IQ,  counting  ability,  and 
additive  reasoning.  Among  the  3  components  of  working  memory,  only  the  central  executive  was  a 
unique  predictor  for  all  measures  of  mathematical  achievement.  Autoregressive  analyses  provided  further 
evidence  for  the  strong  predictive  powers  of  additive  reasoning  and  working  memory.  Overall,  additive 
reasoning  accounted  for  the  greatest  amount  of  variance  in  mathematical  achievement  both  concurrently 
and  longitudinally.  This  finding  underscores  the  importance  of  additive  reasoning  in  children’s  mathe¬ 
matical  development. 

Keywords:  additive  reasoning,  counting  ability,  working  memory,  mathematical  achievement 


The  aim  of  this  study  was  to  investigate  the  relative  importance 
of  working  memory,  counting  ability,  and  additive  reasoning  in 
children’s  mathematical  achievement.  Mathematical  achievement 
has  an  influence  on  individuals’  performance  in  college  and  choice 
of  careers  (National  Mathematics  Advisory  Panel,  2008).  The 
mathematical  skills  and  knowledge  at  an  early  age  have  been 
shown  to  predict  mathematical  achievement  test  scores  in  both 
primary  and  high  schools  (e.g.,  Jordan,  Kaplan,  Ramineni,  & 
Locuniak,  2009;  Locuniak  &  Jordan,  2008;  Mazzocco  &  Thomp¬ 
son,  2005).  Thus,  providing  children  with  a  strong  foundation  of 
mathematical  competence  is  important  for  success  in  school  and 
beyond. 

Defining  Mathematical  Competence 

What  does  it  mean  to  be  competent  in  mathematics?  The  defi¬ 
nition  of  mathematical  competence  is  important  because  it  affects 
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the  kinds  of  mathematical  skills  examined  in  this  study.  In  general, 
“competence”  indicates  sufficiency  of  knowledge  and  skills  that 
enable  a  person  to  act  in  a  wide  variety  of  situations.  To  illustrate, 
if  a  person  is  said  to  have  competence  in  a  particular  language,  she 
or  he  should  be  able  to  understand  and  interpret  oral  narratives  and 
written  texts  in  that  language.  She  or  he  should  also  be  able  to 
express  her-  or  himself  in  speech  and  in  writing.  Also,  a  person 
who  is  competent  in  a  language  can  read,  write,  listen,  and  speak 
about  different  things  and  in  different  ways  in  that  language.  By 
contrast,  a  person  who  can  only  listen  and  speak  in  a  language 
about  certain  topics  is  not  competent  enough.  This  analogy  with 
linguistic  competence  can  be  an  inspiration  to  answering  the 
following  question:  What  are  the  characteristics  of  a  person  who 
can  deal  with  a  wide  range  of  situations  that  involves  mathematical 
thinking  successfully?  Mathematical  competence  is  the  term  that 
we  used  to  denote  this  collective  and  complex  entity.  In  this  article, 
we  start  from  reviewing  two  different  theoretical  perspectives  on 
children’s  mathematics  learning.  Proponents  of  different  ap¬ 
proaches  have  a  different  focus  about  the  characteristics  of  a  young 
child  who  is  mathematically  competent.  The  task  of  the  following 
section  is  to  reflect  on  and  theoretically  analyze  different  perspec¬ 
tives  as  the  point  of  departure  of  our  work,  from  which  we 
hypothesized  the  essential  pillars  that  might  form  the  basis  of 
mathematical  competence. 

The  Number  Sense  Perspective 

“Number  sense”  has  been  considered  as  an  inborn  characteristic 
of  children  that  forms  the  foundation  for  mathematics  learning 
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(e.g.,  Dehaene,  1997;  Gelman  &  Butterworth,  2005;  Gelman  & 
Gallistel,  1978;  Siegler  &  Booth,  2005).  A  review  of  the  literature 
(Siegler  &  Booth,  2005)  suggests  one  definition  of  number  sense: 
“a  process  of  translating  between  alternative  quantitative  represen¬ 
tations”  (p.  198).  The  translations  can  be  between  the  representa¬ 
tions  of  spatial  and  numerical  information  (e.g.,  “About  how  many 
feet  wide  is  this  classroom?”),  the  representations  of  temporal  and 
tactile  information  (e.g.,  “tap  your  finger  once  every  5  s”),  and  so 
on.  It  has  been  argued  that  an  accurate  estimation  of  numerical 
magnitudes  is  the  basis  for  children  to  learn  mathematics. 

The  core  of  number  sense  seems  to  be  the  presence  of  a  mental 
number  line,  which  is  based  on  the  hypothesis  that  numbers  are 
arranged  spatially  on  a  continuum  (from  left  to  right  for  cultures 
using  left-to-right  orthographies;  Dehaene,  1997;  Siegler  &  Booth, 
2005).  One  crucial  characteristic  of  the  number  line  in  young 
children  is  that  it  is  a  fuzzy  representation  of  quantities  and  it  takes 
time  for  children  to  develop  the  mental  number  line  perfectly.  The 
form  of  mental  number  line  representations  has  been  measured 
with  a  number  line  estimation  task.  In  this  task,  participants  are 
presented  with  some  lines  with  a  number  at  each  end  (e.g.,  0  and 
10,  0  and  100,  0  and  1,000).  They  are  asked  to  estimate  the 
location  of  a  third  number  (e.g.,  42)  on  the  line.  This  task  is 
thought  to  be  measuring  the  mental  number  line  because  it  paral¬ 
lels  the  ratio  characteristics  of  the  number  system:  80  is  4  times 
greater  than  20,  so  the  distance  of  the  estimated  position  of  80  from 
0  should  be  4  times  greater  than  the  distance  of  the  estimated 
position  of  20  from  0.  Likewise,  the  distance  between  0  and  20 
should  be  the  same  as  the  distance  between  20  and  40,  80  and  100, 
140  and  160,  and  so  on,  which  gives  a  perfect  linear  function  of 
estimates  when  they  are  plotted  against  the  correct  place  on  the 
number  line. 

Although  this  kind  of  numerical  estimation  is  not  difficult  for 
most  adults,  it  takes  some  time  for  young  children  to  grasp.  It  has 
been  suggested  (e.g.,  Siegler  &  Booth,  2004;  Siegler  &  Opfer, 
2003)  that  young  children  have  difficulties  in  representing  the 
magnitudes  accurately  but  this  improves  with  age.  Young  children 
are  bad  at  estimating  the  position  of  small  and  large  numbers:  They 
overestimate  small  numbers  and  underestimate  larger  numbers, 
which  gives  a  logarithmic  description  of  their  estimates  when  they 
are  plotted  against  the  correct  place  on  the  number  line.  For 
example,  kindergartners  at  the  ages  of  5  and  6  were  found  to 
exhibit  a  clear  logarithmic  pattern  of  number  representation  (Sieg¬ 
ler  &  Booth,  2004).  It  was  also  indicated  that  half  of  the  first 
graders  at  the  ages  of  6  and  7  showed  logarithmic  patterns, 
whereas  the  other  half  fit  a  linear  pattern.  Second  graders  at  the 
ages  of  7  and  8  revealed  numerical  representations  that  were  best 
fit  by  a  linear  function. 

Why  might  numerical  magnitude  estimation  be  related  to  com¬ 
putational  proficiency?  Some  researchers  suggest  that  when  a 
person  solves  an  arithmetic  problem,  the  rote  verbal  representation 
of  the  answer  to  the  problem  and  an  approximate  representation  of 
the  answer’s  magnitude  will  be  activated  (Ansari,  2008;  Hanich, 
Jordan,  Kaplan,  &  Dick,  2001).  If  an  approximate  representation 
has  more  activation  strength  concentrated  on  the  right  answer  and 
the  numbers  around  it,  the  person  is  more  likely  to  retrieve  the 
correct  answer.  An  accurate  magnitude  representation  also  allows 
for  the  rejection  of  implausible  answers  and  recalculation  in  cases 
when  a  person  has  retrieved  implausible  answers.  In  contrast, 
approximate  representations  in  which  activation  strength  is  more 


widely  distributed  among  different  numbers  are  likely  to  lead  to 
the  retrieval  of  wrong  answers.  Thus,  according  to  this  view,  the 
ability  to  represent  magnitude  accurately  may  help  children  re¬ 
trieve  correct  answers  to  novel  addition  problems  (Booth  &  Sieg¬ 
ler,  2008;  Siegler  &  Ramani,  2009).  An  accurate  representation  of 
numerical  magnitudes  may  also  relate  to  the  development  of  a 
variety  of  computational  estimation  strategies  (e.g.,  Dowker, 
Flood,  Griffiths,  Harriss,  &  Hook,  1996).  It  was  found  that  chil¬ 
dren  whose  number  line  estimates  better  fit  a  linear  function  had 
better  performance  on  a  range  of  other  numerical  tasks,  such  as 
magnitude  comparison,  memory  for  numbers,  calculation,  and 
standardized  mathematical  achievement  tests  (e.g.,  Booth  &  Sieg¬ 
ler,  2006;  Geary,  2011;  Geary,  Hoard,  Byrd-Craven,  Nugent,  & 
Numtee,  2007). 

In  summary,  number  sense  is  one  approach  that  we  may  refer¬ 
ence  for  the  definition  of  mathematical  competence  in  this  study. 
It  is  one  essential  domain  of  mathematical  competence  in  young 
children  because  it  may  foster  the  growth  of  computational  facility. 
However,  the  definition  of  mathematical  competence  within  this 
theoretical  framework  is  limited  in  several  ways.  First,  according 
to  this  perspective,  children  are  bom  with  a  fuzzy  representation  of 
quantities  and  the  representations  become  more  accurate  with  age. 
Perceived  numerosity  is  hypothesized  to  provide  children  with  the 
foundation  for  understanding  number  words.  However,  every 
number  has  its  exact  meaning  that  is  not  simply  an  estimation 
(Samecka  &  Gelman,  2004).  For  instance,  even  very  young  chil¬ 
dren  understand  that  if  they  add  one  object  to  eight  objects,  they 
will  no  longer  have  eight  objects.  It  is  obvious  to  them  that  “eight” 
is  not  the  same  as  “approximately  eight.”  This  theory  does  not 
explain  how  young  children,  from  the  starting  point  of  an  impre¬ 
cise  analogue  representation,  suddenly  come  to  understand  the 
precise  meanings  of  number.  The  number  sense  perspective  does 
not  define  what  a  number  is,  but  our  view  is  that  how  we  concep¬ 
tualize  “number”  is  important  in  mathematics  education  because  it 
affects  the  approach  that  we  use  to  teach  mathematics  to  children. 
We  will  come  back  to  what  may  constitute  the  “meanings  of 
‘number’”  from  another  perspective  shortly  in  this  article.  At  the 
moment,  we  consider  that  the  number  sense  perspective  does  not 
offer  a  good  account  of  the  concept  of  “number,”  which  should  be 
precise  and  fundamental  to  mathematics  learning. 

Second,  although  the  number  sense  perspective  may  explain 
variation  in  computational  proficiency  (e.g.,  Booth  &  Siegler, 
2006;  Geary,  2011;  Geary  et  al.,  2007),  it  is  difficult  to  understand 
from  this  perspective  how  magnitude  estimation  allows  us  to  solve 
problems  in  a  variety  of  mathematical  situations.  Consider  additive 
reasoning,  different  kinds  of  situations  that  involve  part-whole 
relations  have  been  identified  (Carpenter,  Hiebert,  &  Moser,  1981; 
De  Corte  &  Verschaffel,  1985,  1987;  Ginsburg,  1982;  Hudson, 
1983;  Nesher,  1982;  Stem,  1993;  Svenson  &  Broquist,  1975; 
Vergnaud,  1979,  1982).  Research  that  has  examined  children’s 
performance  on  these  types  of  problems  (e.g.,  Carpenter  et  al., 
1981;  Verschaffel,  1994)  has  shown  findings  that  challenge  the 
number  sense  perspective.  First,  children’s  accuracy  rates  for  prob¬ 
lems  that  require  the  same  calculation  are  different.  For  example, 
in  transformation  situations,  problems  with  the  final  quantity  un¬ 
known  are  significantly  easier  than  those  in  which  the  initial 
quantity  is  unknown.  Second,  when  the  situation  involves  the 
composition  of  two  quantities,  finding  the  whole  is  significantly 
easier  than  finding  a  part.  Third,  reference  set  problems  are  sig- 
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nificantly  more  difficult  than  other  types  of  problems  that  involve 
comparisons  even  when  the  quantities  in  the  problems  are  the 
same.  In  these  studies,  the  demands  for  arithmetic  computation 
(e.g.,  quantities  and  types  of  calculation)  are  controlled  for  while 
the  quantitative  reasoning  demands  vary.  Thus,  the  differences  in 
the  rates  of  correct  responses  are  not  likely  due  to  individual 
differences  in  estimating  numerical  magnitudes,  but  in  their  ability 
to  reason  about  the  relations  of  quantities  in  the  problems.  The 
number  sense  perspective  is  limited  in  a  sense  that  it  touches  upon 
quantities  only,  but  it  does  not  entertain  the  idea  that  children  need 
to  understand  relations  between  quantities  to  choose  relevant  arith¬ 
metic  operations  to  solve  a  variety  of  problems.  In  brief,  the 
number  sense  perspective  does  not  provide  a  basis  for  us  to 
understand  how  people  solve  mathematical  problems  in  different 
situations. 

In  summary,  the  number  sense  perspective  may  be  useful  for 
explaining  children’s  growth  in  computational  proficiency.  The 
ability  to  estimate  numerical  magnitudes  may  contribute  to  arith¬ 
metic  competence  through  the  development  of  a  variety  of  com¬ 
putational  estimation  strategies.  However,  this  theoretical  frame¬ 
work  suffers  several  limitations.  First,  it  is  not  clear  how  the  link 
between  an  imprecise  analogue  system  and  a  precise  system  of 
numbers  can  be  forged.  It  does  not  give  a  precise  conceptualization 
of  the  meanings  of  number.  Second,  it  cannot  explain  how  we  can 
base  on  numerical  estimation  to  solve  mathematical  problems  in  a 
variety  of  situations.  Therefore,  it  appears  that  we  need  to  turn  to 
another  theoretical  approach  to  search  for  a  better  definition  of 
mathematical  competence. 

The  Mathematical  Thinking  Perspective 

The  second  view  of  children’ s  mathematics  learning,  which  we 
term  as  mathematical  thinking  perspective ,  focuses  on  how  chil¬ 
dren  think  about  mathematics  logically  (e.g.,  Bryant,  1995;  Car¬ 
penter  &  Moser,  1982;  Ginsburg,  Klein,  &  Starkey,  1998;  Nunes 
&  Bryant,  1996,  2015;  Nunes,  Bryant,  Barros,  &  Sylva,  2012; 
Piaget,  1952;  Piaget  &  Inhelder,  1975;  Thompson,  1993,  1994; 
Vergnaud,  1997,  2009).  Mathematical  thinking  involves  the  un¬ 
derstanding  of  the  meanings  of  number.  The  development  of 
mathematical  thinking  is  to  some  extent  similar  to  language  learn¬ 
ing.  To  progress  in  mathematical  thinking,  children  need  to  learn 
mathematical  symbols  and  their  meanings  and  to  connect  them 
sensibly,  just  as  one  has  to  combine  words  sensibly  in  sentences. 
Quantitative  reasoning  involves  using  numbers  to  represent  quan¬ 
tities  and  relations  between  quantities  as  well  as  operating  on  the 
numbers  to  reach  conclusions  about  the  quantities  (Thompson, 
1993).  Thus,  one  core  intellectual  demand  to  understand  the  mean¬ 
ings  of  number  is  the  need  to  understand  relations  between  quan¬ 
tities,  rather  than  merely  understanding  things  in  isolation. 

The  nature  of  understanding  number.  In  the  context  of 
mathematical  thinking,  to  say  that  a  child  has  an  understanding  of 
number,  we  would  expect  some  demonstration  that  the  child  un¬ 
derstands  the  relational  meanings  of  number.  For  example,  Jean 
Piaget  (1952)  pioneered  this  view  when  he  argued  that  we  need  to 
examine  whether  children  understand  the  equivalence  between  sets 
to  credit  children  with  an  understanding  of  cardinality.  Suppose 
Mary  has  five  sweets  and  she  exchanges  with  Annie  one  sweet  that 
she  has  for  one  sticker.  If  Mary  understands  cardinality,  she  should 
know  that,  by  the  end  of  this  exchange,  she  would  have  five 


stickers  without  having  to  count.  If  Mary  is  able  to  count  the 
sweets  and  say  there  are  five,  but  she  does  not  know  that  how 
many  stickers  she  has  after  sharing  on  a  one-to-one  basis,  accord¬ 
ing  to  the  mathematical  thinking  perspective,  we  can  only  say  that 
Mary  can  count,  but  we  cannot  say  that  she  understands  cardinal¬ 
ity.  In  short,  Piaget  considered  cardinality  as  the  number  that 
relates  one  set  of  objects  to  other  sets.  If  there  ar t  five  objects  in 
this  set,  then  it  has  the  same  quantity  as  any  other  set  with  five 
objects. 

Another  crucial  aspect  of  the  nature  of  understanding  number, 
which  plays  an  important  part  in  Piaget’s  theory,  is  logical  infer¬ 
ences.  All  quantities  (e.g.,  number,  height,  temperature)  can  be 
arranged  in  a  particular  order  from  smaller  to  larger.  To  grasp  the 
nature  of  this  order,  we  have  to  master  a  fundamental  logical  rule 
called  transitivity.  If  Quantity  A  is  greater  than  Quantity  B,  and  B 
is  greater  than  Quantity  C,  then  it  follows  that  A  must  also  greater 
than  C.  Some  children  may  only  know  that  3  is  more  than  2  and  2 
is  more  than  1 ,  but  they  cannot  work  out  the  relation  between  3  and 
1,  which  they  cannot  directly  compare.  According  to  the  mathe¬ 
matical  thinking  perspective,  these  children  are  demonstrating  an 
incomplete  understanding  of  the  relations  between  different  num¬ 
bers.  This  aspect  of  number  knowledge  is  known  as  the  ordinal 
concept  of  number. 

The  cardinal  and  ordinal  concepts  of  number  are  requirements 
for  the  most  basic  mathematical  activity  of  all — counting.  How¬ 
ever,  Piaget’s  list  of  logical  requirement  goes  further  than  this.  He 
contends  that  all  mathematical  procedures  have  their  own  logical 
demands.  For  example,  soon  after  children  have  learned  to  count, 
they  start  to  learn  addition  and  subtraction,  and  then  multiplication 
and  division  later  at  school.  Proponents  of  the  mathematical  think¬ 
ing  perspective  argue  that  it  is  important  for  children  to  learn  about 
the  connections  between  these  operations.  One  obvious  connection 
is  inversion.  This  is  the  idea  that  each  operation  has  its  converse. 
For  example,  the  inverse  relation  to  addition  is  subtraction,  and 
vice  versa;  the  inverse  of  multiplication  is  division,  and  vice  versa. 
The  understanding  of  the  inversion  principle  is  a  fundamental 
aspect  of  learning  about  number.  Piaget  (1952)  argues  that  it  is  not 
possible  to  grasp  the  “additive  composition  of  number”  without 
understanding  the  inversion  principle.  Additive  composition  of 
number  refers  to  the  fact  that  numbers  are  made  up  of  other 
numbers.  For  example,  9  consists  of  4  and  5  or  6  and  3,  and  it 
follows  that  if  you  subtract  6  from  9,  you  will  be  left  with  3.  Piaget 
argues  that  it  is  not  sufficient  for  children  to  know  or  be  able  to 
calculate  that  5  +  3  =  8  and  that  8  —  3  =  5,  they  must  also  realize 
why  each  of  these  relations  automatically  follows  from  the  other. 
According  to  this  view,  numbers  are  not  simply  a  series  of  words 
in  a  constant  order,  but  they  also  reflect  the  part-whole  logic  of  the 
number  system — each  number  words  encompasses  the  previous 
ones  additively  (8  means  7  +  1,  6  + 2,  5  +  3,  etc.).  Nunes  and 
Bryant  (2015)  call  this  idea  the  “analytical  meanings  of  number” 
because  the  meaning  is  given  by  definitions  within  a  number 
system. 

These  particular  examples  make  the  point  that,  according  to  the 
mathematical  thinking  perspective,  children  need  to  grasp  certain 
logical  principles  to  do  well  in  mathematics.  Examples  of  the 
relational  meaning  of  number  involve  the  cardinal  and  ordinal 
concept  of  numbers  and  the  inversion  principles.  It  is  also  reason¬ 
able  to  suggest  that  understanding  the  additive  composition  of 
number  may  contribute  to  the  development  of  a  more  accurate 
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estimation  of  numerical  magnitudes  on  a  number  line.  It  is  possible 
that  once  children  have  understood  the  additive  composition  of 
number,  it  would  be  easier  for  them  to  represent  the  relative 
magnitudes  of  quantities  and  numbers.  In  other  words,  understand¬ 
ing  relations  may  actually  support  the  development  of  numerical 
magnitude  representation.  Therefore,  compared  with  the  number 
sense  account,  the  mathematical  thinking  perspective  appears  to 
give  a  more  comprehensive,  precise,  and  parsimonious  model  that 
captures  the  fundamental  concept  of  number. 

Mathematical  thinking  and  computational  proficiency. 

Why  is  grasping  the  nature  of  number  important  in  learning 
mathematics?  One  possible  reason  is  that  an  understanding  of  the 
analytical  meaning  of  numbers  contributes  to  the  success  in  cal¬ 
culation.  According  to  the  mathematical  thinking  perspective, 
arithmetic  is  the  study  and  use  of  relations  between  numbers  to 
come  to  conclusions  and  this  is  always  carried  out  using  a  number 
system,  which  has  specific  characteristics.  One  characteristic  is  the 
inverse  relation  between  addition  and  subtraction. 

The  inversion  principle  may  underlie  the  understanding  of  the 
exchanges  in  addition  and  subtraction  of  multidigit  numbers.  Some 
researchers  (e.g.,  Fuson,  1990;  Nunes  &  Bryant,  1996)  have  sug¬ 
gested  that  understanding  carrying  and  borrowing  demands  the 
knowledge  of  the  inverse  relation  between  addition  and  subtrac¬ 
tion.  For  example,  Fuson  (1990)  argued  that  when  we  are  adding 
eight  tens  and  seven  tens,  to  understand  the  “ten-for-one  to  the  left 
exchange,”  we  have  to  recognize  that  we  are  taking  100  away  from 
the  tens  place  and  adding  100  to  the  hundreds  place,  so  that  the 
total  value  is  not  changed.  We  also  need  similar  reasoning  to 
subtract  63  from  1,657:  To  understand  the  conservation  of  the 
minuend,  we  need  to  understand  that  taking  away  100  from  the 
hundreds  place  and  adding  100  in  the  form  of  10  tens  to  the  tens 
place  does  not  alter  the  quantity. 

An  understanding  of  the  inversion  principle  may  also  contribute 
to  the  use  of  a  computational  strategy  called  “indirect  addition,”  in 
which  children  can  use  additions  to  solve  subtraction  problems 
effectively  if  the  numbers  are  close  to  each  other.  For  example,  to 
solve  “21  —  18,”  it  is  less  likely  to  make  mistakes  if  they  count  up 
from  18  to  21.  The  use  of  the  inverse  relation  between  addition  and 
subtraction  to  calculate  has  been  observed  in  oral  arithmetic.  For 
example,  Nunes,  Schliemann,  and  Carraher  (1993)  reported  two 
different  ways  street  vendors  in  Brazil  used  the  inverse  relation  to 
solve  computations.  Problems  about  change  were  commonly 
solved  with  indirect  addition.  For  instance,  when  someone  bought 
something  valued  80  Cruzeiros  and  paid  with  a  500  note,  a  child 
vendor  said,  “80,  90,  100.  420”  (Nunes  et  al.,  1993,  p.  25).  In  this 
case,  the  child  calculated  the  change  through  indirect  addition.  In 
another  example,  a  child  used  the  inverse  relation  effectively  in  a 
different  way.  He  solved  the  problem  243  -  75  by  simplifying  the 
problem:  At  first,  he  subtracted  143  from  243,  which  becomes 
100  —  75,  and  then  he  added  143  back  to  reach  the  answer.  In 
short,  if  children  understand  the  inverse  relation  between  addition 
and  subtraction,  they  are  able  to  think  of  various  ways  to  simplify 
an  arithmetic  problem  with  their  logical  understanding  of  number 
to  enhance  their  computational  proficiency  (Canobi,  2004;  Canobi, 
Reeve,  &  Pattison,  2003). 

Compared  with  the  number  sense  perspective,  the  mathematical 
thinking  perspective  also  addresses  how  children  solve  arithmetic 
calculation.  It  suggests  that  reasoning  about  the  relations  between 


numbers  can  be  a  basis  of  effective  calculation.  From  this  perspec¬ 
tive,  arithmetic  is  not  just  about  memorizing  number  facts.  Instead, 
the  crux  of  a  successful  problem  solver  of  arithmetic  calculation 
refers  to  the  ability  to  understand  the  relational  or  analytical 
meaning  of  number. 

Mathematical  thinking  and  solving  problems  in  different 
situations.  Up  to  now,  we  have  highlighted  the  importance  of 
understanding  the  analytical  meaning  of  number  in  arithmetic 
calculation.  Now  we  turn  to  what  Nunes  and  Bryant  (2015)  have 
called  the  “representational  meaning  of  number,”  which  is  about 
working  out  relations  between  quantities.  Thompson  (1994)  high¬ 
lights  the  importance  of  a  logical  “comprehension  of  a  situation” 
(Thompson,  1994,  pp.  187-188)  in  solving  mathematical  problems 
in  different  situations.  He  argues  that  it  is  important  to  analyze  the 
underlying  quantitative  structures  of  mathematical  problems:  a 
prominent  characteristic  of  reasoning  quantitatively  is  that  num¬ 
bers  and  numeric  relationships  are  of  secondary  importance,  and 
do  not  enter  into  the  primary  analysis  of  a  situation.  What  is 
important  is  relationships  among  quantities”  (Thompson,  1993,  p. 
165). 

The  solution  to  many  story  problems  rests  upon  the  knowledge 
of  the  underlying  relations  between  the  quantities  in  the  problem. 
Occasionally,  this  set  of  relations  is  not  obvious  to  problem  solv¬ 
ers.  This  applies  to  some  story  problems  whose  solutions  rely  on 
the  understanding  of  the  inverse  relation  between  addition  and 
subtraction.  For  example,  a  change  problem  is  easy  when  the 
missing  information  is  the  result  of  the  change  (e.g.,  “David  had 
eight  books.  Then  Peter  gave  him  three  more  books.  How  many 
books  does  David  have  now?”)  because  the  action  in  the  story  and 
the  arithmetic  operation  required  to  solve  the  problem  are  directly 
related.  In  other  words,  a  problem  that  involves  a  change  that 
increases  the  quantity  can  be  solved  by  addition,  while  one  that 
decreases  the  quantity  can  be  solved  by  subtraction. 

In  contrast,  when  the  starting  situation  is  not  known  (e.g.,  “Alex 
had  some  cookies.  He  gave  three  cookies  to  his  mother  and  had 
eight  cookies  left.  How  many  cookies  did  he  have  before?”),  one 
must  decide  which  arithmetic  operation  to  use  for  calculation  on 
the  basis  of  the  information  about  the  change  and  its  end  result. 
This  type  of  start-unknown  problems  is  more  difficult  (e.g..  Car¬ 
penter  et  al.,  1981;  De  Corte  &  Verschaffel,  1985,  1987;  Ginsburg, 
1982)  because  the  relation  between  the  action  described  in  the 
story  and  the  operation  is  inverse,  that  is,  a  problem  that  involves 
a  change  that  decreases  the  quantity  has  to  be  solved  by  addition. 
Thus,  students  must  understand  that  the  operation  “addition”  can 
be  conceived  as  the  inverse  of  “subtraction”  and  analyze  the 
quantitative  relations  underlying  the  problem  situation. 

Verschaffel  (1994)  examined  the  difficulty  of  comparison  prob¬ 
lems  which  also  demand  inverse  reasoning,  but  applied  to  relations 
rather  than  operations.  In  one  type  of  comparison  problem,  the 
relation  between  quantities  can  be  described  as  “more  than”  but  the 
problem  solver  has  to  think  of  its  inverse  to  solve  the  problem  for 
example,  when  the  reference  set  is  the  missing  quantity  (e.g.,  “Pete 
has  29  nuts.  He  has  14  more  nuts  than  Rita.  How  many  nuts  does 
Rita  have?”).  Verschaffel  asked  fifth  graders  in  Belgium  (aged 
about  1 1  years)  to  solve  comparison  problems  in  which  the  relation 
was  consistent  with  the  operation  (i.e.,  the  relation  was  described 
as  more  than”  and  the  operation  to  be  used  to  solve  the  problem 
was  an  addition,  e.g.,  "Timothy  has  29  cups.  Jenny  has  14  more 
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cups  than  Timothy.  How  many  cups  does  Jenny  have?”)  or  was 
inconsistent  (i.e.,  the  relation  was  described  as  “more  than”  and  the 
operation  to  be  used  to  solve  the  problem  was  a  subtraction,  as  in 
the  problem  presented  above).  When  the  relation  and  the  operation 
were  consistent,  the  correct  rate  was  92.5%;  when  the  relation  and 
the  operation  were  inconsistent,  the  correct  rate  was  72.5%.  Be¬ 
cause  the  numbers  involved  in  these  problems  are  the  same,  the 
ability  to  reason  about  the  relation  between  quantities  is  likely  to 
be  the  reason  that  explains  the  difference  in  accuracy.  Therefore, 
students  does  not  only  have  to  learn  that  addition  is  the  inverse  of 
subtraction  and  vice  versa,  but  also  that  the  relation  “more  than” 
can  be  seen  as  the  inverse  of  “less  than”  and  vice  versa. 

In  summary,  according  to  the  mathematical  thinking  perspec¬ 
tive,  quantitative  reasoning  that  is  based  on  relations  between 
quantities  is  crucial  in  solving  mathematical  problems  in  a  variety 
of  situations,  whereas  the  numbers  used  to  represent  the  quantities 
are  of  secondary  importance.  Some  additive  reasoning  situations 
involve  just  quantities  whereas  others  involve  quantities  and  rela¬ 
tions.  If  a  problem  requires  reasoning  about  relations,  it  is  signif¬ 
icantly  more  difficult  than  a  similar  one  that  involves  just  quanti¬ 
ties.  Children  have  to  reason  in  a  sophisticated  manner  about  the 
underlying  structure  of  the  quantitative  relations  in  the  story,  to 
choose  whether  to  add  or  subtract  and  solve  the  problem  success¬ 
fully.  Thus,  compared  with  the  number  sense  perspective,  the 
mathematical  thinking  perspective  definitely  has  an  edge  by  pro¬ 
viding  a  good  account  of  how  children  solve  mathematical  prob¬ 
lems  in  different  situations. 

Definition  of  Mathematical  Competence 

The  mathematical  thinking  perspective  focuses  on  the  under¬ 
standing  of  the  meanings  of  number — analytical  and  representa¬ 
tional.  The  analytical  meaning  of  number  is  defined  by  a  number 
system,  whereas  the  representational  meaning  refers  to  the  use  of 
numbers  to  represent  quantities  (Nunes  &  Bryant,  2015).  Compar¬ 
ing  the  two  approaches  to  children’s  mathematics  learning,  the 
mathematical  thinking  perspective  appears  to  provide  a  better 
theoretical  framework  to  understand  mathematical  competence  in 
children.  As  the  National  Mathematics  Advisory  Panel  (2008)  has 
recommended  in  their  report,  “the  curriculum  must  simultaneously 
develop  conceptual  understanding,  computational  fluency,  and 
problem-solving  skills”  (p.  xix).  Clearly,  the  number  sense  per¬ 
spective  touches  upon  computational  fluency  only.  By  contrast,  the 
mathematical  thinking  approach  addresses  all  three  aspects  of 
mathematical  achievement.  Our  view  is  that  it  is  not  sufficient  to 
say  that  a  child  who  possesses  a  good  sense  of  number  is  compe¬ 
tent  in  mathematics.  A  child  who  is  competent  mathematics  should 
also  have  a  good  understanding  of  the  meanings  of  numbers  and 
quantities.  This  understanding  appears  to  support  his  or  her  ability 
to  excel  in  a  variety  of  mathematical  tasks.  Therefore,  we  consider 
that  mathematical  thinking  is  a  better  way  to  conceive  of  mathe¬ 
matical  competence. 

Cognitive  Foundations  of  Mathematical 
Thinking  in  Children 

After  defining  mathematical  competence  as  mathematical  think¬ 
ing,  we  explore  its  cognitive  foundations  in  children.  What  are  the 
pillars  of  mathematical  thinking?  What  kinds  of  skills  do  children 


need  to  possess  to  have  mathematical  competence?  The  skills 
required  for  mathematical  competence  may  not  be  the  same  for 
children  of  different  ages.  In  this  study,  we  were  interested  in 
studying  children  at  the  age  of  around  6,  who  have  acquired  some 
skills  in  counting  and  begin  to  learn  addition  and  subtraction. 
Thus,  we  would  focus  on  these  aspects  in  the  following  discussion. 

Working  Memory 

It  seems  uncontroversial  that  learning  and  using  mathematics 
must  draw  on  some  general  cognitive  resources.  For  example,  to 
solve  the  following  problem — “David  had  8  books.  Then  Peter 
gave  him  3  more  books.  How  many  books  does  David  have 
now?” — we  need  to  (a)  pay  attention  to  the  information;  (b)  select, 
remember,  and  reason  about  the  relevant  parts  of  this  information; 
and  (c)  execute  arithmetic  operations  that  help  us  answer  the 
problem.  Likewise,  when  children  have  to  solve  a  calculation  or  an 
applied  problem,  they  have  to  keep  in  mind  the  information  in  the 
problem  and  the  steps  to  execute  the  solution,  while  monitoring 
what  they  have  done  and  what  remains  to  be  done. 

There  are  different  theoretical  models  of  working  memory,  such 
as  Baddeley-Hitch  model  of  working  memory  (Baddeley  &  Hitch, 
1974),  Engle’s  model  of  controlled  attention  (Engle,  Kane,  & 
Tuholski,  1999),  and  Miyake’s  executive  model  (Miyake  et  al., 
2000).  Different  theoretical  models  lead  to  different  definitions 
and  corresponding  measures  for  working  memory  in  different 
studies.  Most  previous  studies  that  examined  the  connection  be¬ 
tween  working  memory  and  mathematics  learning  in  children  used 
the  model  proposed  by  Baddeley  and  Hitch  (e.g.,  Cowan  &  Powell, 
2014;  Gathercole  &  Pickering,  2000;  Holmes  &  Adams,  2006;  Keeler 
&  Swanson,  2001;  Lee,  Ng,  Ng,  &  Lim,  2004;  Lehto,  1995;  Noel, 
Seron,  &  Trovarelli,  2004;  Swanson  &  Beebe-Frankenberger,  2004; 
Wilson  &  Swanson,  2001).  Within  this  model,  working  memory  has 
been  defined  as  “a  brain  system  that  provides  temporary  storage 
and  manipulation  of  the  information  necessary  for  .  .  .  complex 
cognitive  tasks”  (Baddeley,  1992,  p.  556).  It  provides  a  unified  and 
parsimonious  theoretical  framework  that  comprises  three  key  com¬ 
ponents — the  phonological  loop,  the  visuospatial  sketchpad,  and 
the  central  executive.  The  phonological  loop  serves  to  hold  speech- 
based  information  temporarily,  whereas  the  visuospatial  sketchpad 
holds  visual  and  spatial  information  for  a  short  period  of  time.  The 
central  executive  component  is  responsible  for  focusing,  dividing, 
and  switching  attention,  which  provides  an  overall  monitoring  and 
regulation  of  the  entire  working  memory  system  and  coordination 
of  the  activities  among  different  components  in  the  system.  In  the 
present  study,  we  would  base  on  Baddeley’ s  working  memory 
model  (Baddeley  &  Hitch,  1974;  Baddeley,  1992)  to  test  the 
relation  between  working  memory  and  children’s  mathematical 
achievements. 

The  evidence  regarding  the  close  connection  between  working 
memory  and  children’s  mathematical  achievements  are  well  estab¬ 
lished  (e.g.,  Alloway  &  Alloway,  2010;  Bull,  Espy,  Wiebe,  & 
Andrews,  2008;  Geary,  1993;  Huttenlocher,  Jordan,  &  Levine, 
1994;  Rasmussen  &  Bisanz,  2005;  Swanson,  2011;  Welsh,  Nix, 
Blair,  Bierman,  &  Nelson,  2010).  However,  it  is  likely  that  work¬ 
ing  memory  is  important  for  learning  and  performance  across  all 
academic  domains.  Thus,  the  relation  between  working  memory 
and  mathematical  achievements  may  not  be  specific.  The  non¬ 
specificity  of  working  memory  suggests  that,  to  understand  what 
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factors  predict  children’s  success  in  mathematics  learning,  we  need 
to  look  at  other  abilities  that  are  more  specifically  related  to 
mathematics.  It  is  reasonable  to  speculate  that  these  domain- 
specific  abilities  would  explain  variation  in  children’s  mathemat¬ 
ical  achievements  beyond  general  cognitive  resources,  such  as 
working  memory. 

Counting  Ability 

Counting  is  one  type  of  domain-specific  abilities  central  to 
children’s  mathematical  thinking  because  learning  to  count  pro¬ 
vides  children  with  words  to  represent  quantities.  This  activity 
helps  children  reflect  upon  and  develop  the  logical  concept  of 
one-to-one  correspondence,  ordinality,  and  cardinality.  The  defi¬ 
nitions  of  conceptual  knowledge  of  counting  vary  in  the  literature. 
In  the  following,  different  ways  of  conceptualizing  counting  are 
explored  because  they  influence  how  we  interpret  the  findings 
from  research  that  examines  the  connection  between  counting  and 
children’s  mathematical  achievement. 

One  popular  theory  about  counting  was  proposed  by  Gelman 
and  Gallistel  (1978).  The  researchers  suggest  three  “how-to-count” 
principles  that  are  necessary  for  correct  counting,  including  the 
“one-to-one  correspondence,”  “stable  order,”  and  “cardinality” 
principles.  The  one-to-one  correspondence  principle  refers  to  the 
understanding  that  one  must  only  tag  an  object  in  an  array  with  one 
and  only  one  label  for  each  individual  object.  The  stable  order 
principle  requires  the  person  who  counts  to  choose  tags  that 
correspond  to  items  in  an  array  in  a  stable  order,  which  should  stay 
the  same  regardless  of  the  number  of  items.  The  cardinality  prin¬ 
ciple,  according  to  Gelman  and  Gallistel  (1978),  is  defined  as  the 
understanding  that  the  number  tag  assigned  to  the  final  item  in  an 
array  represents  the  total  quantity  of  the  set.  Understanding  the 
cardinality  principle  may  underlie  the  use  of  more  efficient  count¬ 
ing  strategies  to  solve  problems.  For  example,  children  who  un¬ 
derstand  cardinality  can  use  the  “first”  procedure  to  solve  an 
arithmetic  problem  for  example,  “8  +  4.”  If  they  know  the  cardinal 
value  of  the  first  number,  they  can  use  this  number  as  the  shortcut 
to  count  more  efficiently:  They  would  start  from  “8”  and  count  “8, 
9,  10,  11,  12”  to  solve  “8  +  4,”  rather  than  start  all  the  way  from 
“1”  and  count  “2,  3,  4,  5,  6,  7,  8,  9,  10,  11,  12”  to  reach  the  answer. 

It  has  been  suggested  that  these  three  counting  principles  govern 
the  counting  behavior  of  young  children  (Gelman  &  Meek,  1983). 
The  proponents  of  this  view  suggest  that  young  children’s  con¬ 
ceptual  understanding  of  these  essential  features  of  counting  pre¬ 
cedes  their  acquisition  of  counting  procedures.  However,  there  is 
an  alternative  view  that  children  do  not  start  with  an  adequate 
understanding  of  the  counting  principles  when  they  count.  Instead, 
they  start  from  imitating  other  people’s  counting  behavior  and 
induce  some  common  features  of  counting  from  the  observation 
(Briars  &  Siegler,  1984;  Fuson,  1988).  These  common  features  are 
called  “unessential”  features  of  counting  because  they  may  are  not 
necessary  for  correct  counting.  Briars  and  Siegler  (1984)  identified 
four  such  unessential  characteristics,  including  (a)  standard  direc¬ 
tion  (items  must  be  counted  from  left  to  right),  (b)  adjacency 
(items  must  be  counted  contiguously),  (c)  pointing  (items  have  to 
pointed  at  during  counting),  and  (d)  start-at-the-end  (items  must  be 
counted  from  one  end  of  an  array  of  objects).  Although  we  do  not 
need  to  follow  these  rules  if  we  want  to  do  a  correct  counting, 
some  children  believe  that  these  four  features  of  counting  are 


necessary  for  it.  This  suggests  that  the  conceptual  understanding  of 
counting  of  some  young  children  is  still  rigid  and  not  yet  fully 
developed. 

In  short,  these  researchers  suggest  that  conceptual  knowledge  of 
counting  refers  to  the  understanding  of  what  is  necessary  for 
correct  counting.  Children  who  have  a  thorough  conceptual  knowl¬ 
edge  of  counting  should  be  able  to  abide  by  the  essential  principles 
and  not  to  mistake  the  unessential  characteristics  of  counting  as  the 
criteria  for  correct  counting.  It  is  necessary  to  respect  each  of  these 
principles  because  they  are  part  of  the  analytical  meaning  of 
number.  Counting  activity  is  important  for  children  to  learn  math¬ 
ematic  because  it  helps  children  think  about  the  meanings  of 
number. 

However,  it  is  not  enough  for  children  to  know  individual 
counting  principles  separately,  but  they  also  need  to  coordinate 
their  knowledge  of  the  principles  to  understand  the  analytical 
meaning  of  number.  For  example,  the  last  number  word  of  a 
counting  sequence  denotes  the  cardinal  value  of  a  set  (cardinality 
principle)  should  only  hold  when  the  counting  follows  the  one-to- 
one  correspondence  principle.  If  one  skips  an  object  in  the  middle 
of  the  counting  sequence,  she  or  he  should  not  say  that  the  last 
number  word  is  the  cardinal  value  of  the  set.  There  is  evidence  that 
some  children  failed  to  coordinate  their  knowledge  of  the  counting 
principles  even  though  they  demonstrated  competence  in  reciting 
the  number  sequence  and  applied  it  to  objects  and  events  (e.g., 
Bermejo,  Morales,  &  deOsuna,  2004;  Freeman,  Antonucci,  & 
Lewis,  2000;  Samecka  &  Gelman,  2004;  Sophian,  1988).  These 
studies  suggest  that  knowing  how  to  count  does  not  necessarily 
imply  a  full  understanding  of  number.  For  example,  Bermejo  and 
colleagues  (2004)  observed  that  the  4-  and  6-year-old  children, 
who  could  say  there  are  three  items  in  a  set  when  a  person  counts 
forward,  could  not  necessarily  understand  that  if  a  person  count 
backward  from  four  and  the  last  numerical  label  is  “two,”  this  does 
not  mean  that  the  set  contains  two  objects  in  total.  In  this  study, 
some  children  were  not  aware  of  the  contradiction  between  the  two 
answers — they  could  tell  that  the  set  contains  three  objects  if  you 
count  forward,  whereas  the  same  set  contains  two  objects  of  you 
count  backward.  This  finding  shows  a  lack  of  understanding  of 
cardinality  of  numbers,  because  it  is  fundamental  to  the  concept 
of  cardinality  that  two  sets  have  the  same  cardinal  if  the  items  are 
in  one-to-one  correspondence.  Despite  the  findings  from  these 
studies,  much  of  the  research  on  counting  analyzed  children’s 
knowledge  of  these  principles  separately  (e.g.,  Aunola,  Leskinen, 
Lerkkanen,  &  Nurmi,  2004;  Barrouillet,  Fayol,  &  Lathuliere, 
1997;  Koponen,  Aunola,  Ahonen,  &  Nurmi,  2007;  Passolunghi, 
Vercelloni,  &  Schadee,  2007). 

In  summary,  counting  is  a  useful  starting  point  from  which 
children  learn  to  develop  mathematical  thinking.  It  is  an  activity 
that  young  children  can  use  to  learn  the  ordinal  and  cardinal 
meanings  of  number,  but  it  takes  some  time  for  them  to  achieve  a 
full  understanding  of  counting.  Because  counting  is  more  specific 
than  working  memory  to  mathematics  learning,  it  is  expected  that 
individual  differences  in  counting  ability  would  explain  variation 
in  children  s  mathematical  achievements  beyond  general  cognitive 
capacities,  such  as  working  memory  and  general  intelligence.  It 
appears  that  the  measures  of  counting  have  to  be  chosen  with  care, 
which  should  capture  children’s  true  understanding  of  counting. 
Therefore,  to  measure  counting  ability  in  this  study,  we  would  use 
various  tasks,  including  (a)  procedural  counting  (the  ability  to 
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correctly  say  a  number-word  sequence)  and  (b)  conceptual  knowl¬ 
edge  of  counting,  which  refers  to  the  awareness  of  Gelman  and 
Gallistel’s  five  counting  principles  as  well  as  the  ability  to  coor¬ 
dinate  different  principles  to  determine  the  cardinal  number  of  a 
set. 

Additive  Reasoning 

Another  domain-specific  ability  that  is  important  for  young 
children  to  learn  mathematics  is  additive  reasoning.  Additive  rea¬ 
soning  is  based  on  quantities  connected  by  part-whole  relations. 
Two  central  properties  of  part-whole  relations  involve  (a)  com¬ 
mutativity  and  (b)  the  inverse  relation  between  addition  and  sub¬ 
traction  (some  researchers  called  it  the  “complement  principle,” 
e.g.,  Canobi  et  al.,  2003).  Commutativity  refers  to  the  irrelevance 
of  addend  order  to  the  sum,  that  is,  “a  +  b  =  c”  implies  “b  +  a  = 
c,”  whereas  the  complement  principle  refers  to  the  inverse  relation 
between  addition  and  subtraction,  that  is,  “a  +  b  =  c”  implies  “c  - 
a  =  b.”  These  two  principles  are  considered  important  in  chil¬ 
dren’s  mathematics  learning  because  they  contribute  to  the  under¬ 
standing  of  the  relational  meanings  of  numbers  and  quantities.  It  is 
clear  that  the  mastery  of  additive  reasoning  requires  an  integration 
of  the  principles — one  should  understand  that  three  quantities,  for 
example,  3  +  4  =  7,  can  be  expressed  in  four  mathematical 
relations,  for  example,  7  —  3  =  4,  4  +  3  =  7,  7-4  =  3,  and  3  + 
4  =  7,  and  that  these  four  expressions  can  be  deduced  from  each 
other.  A  thorough  understanding  of  the  part-whole  relations  of 
quantities  involves  the  recognition  that  these  expressions  are  es¬ 
sentially  describing  the  same  relation. 

Is  there  any  evidence  regarding  the  connection  between  under¬ 
standing  the  relational  meanings  of  number  and  quantities  and 
children’s  mathematical  achievement?  There  are  some  cross- 
sectional  studies  that  have  addressed  this  question  and  the  findings 
are  inconsistent.  For  example,  Bryant,  Christie,  and  Rendu  (1999) 
examined  children’s  understanding  of  the  inverse  relation  between 
addition  and  subtraction.  They  compared  the  performance  on 
three-term  inverse  problems  (e.g.,  14  +  7  —  7)  and  matched 
control  problems  (e.g.,  9  +  9  -  4)  in  a  group  of  5-  to  8-year-old 
children.  On  the  basis  of  factor  analysis,  they  found  that  children’s 
understanding  of  the  inversion  principle  was  not  related  to  their 
accuracy  on  calculation  (addition  and  subtraction  problems).  In 
contrast,  Canobi  (2004)  investigated  the  associations  between  con¬ 
ceptual  knowledge  and  problem  solving  in  90  children  aged  6  to  8 
years.  Conceptual  knowledge  was  tested  by  a  judgment  task  in 
which  children  made  and  justified  judgments  of  a  puppet’s  solving 
problems  that  involved  part— whole  relations.  She  identified  pat¬ 
terns  of  conceptual  and  problem  solving  profiles  with  cluster 
analysis  and  found  that  advanced  conceptual  profiles  were  asso¬ 
ciated  with  skilled  problem  solving.  Approximately  all  children 
who  recognized  part— whole  relations  used  more  efficient  strate¬ 
gies,  such  as  retrieval  and  decomposition,  to  solve  problems. 
Children  with  a  more  advanced  conceptual  understanding  of  part- 
whole  relations  also  demonstrated  higher  accuracy  and  lower  so¬ 
lution  time  than  those  with  less  advanced  conceptual  profiles. 

Rasmussen,  Ho,  and  Bisanz  (2003)  examined  the  use  of  the 
inversion  principle  in  24  preschool  children  and  24  children  in 
Grade  1.  They  found  that  both  preschool  and  Grade  1  children 
indicated  evidence  of  understanding  the  inversion  principle  in  a 
fully  quantitative  manner.  The  researchers  also  demonstrated  that 


the  relation  between  inversion  understanding  and  calculation  var¬ 
ied  with  age.  They  found  that  the  preschool  children  did  not  show 
evidence  of  an  association  between  their  performance  on  inversion 
problems  and  arithmetic  calculation.  However,  they  identified  a 
significant  correlation  between  inversion  understanding  and  accu¬ 
racy  of  arithmetic  calculation  in  Grade  1  children.  Gilmore  and 
Bryant  (2006)  used  cluster  analyses  to  analyze  different  patterns  of 
inversion  understanding  and  calculation  skills  among  6-  to  9-year- 
old  children.  They  identified  three  distinct  subgroups,  including 
one  group  showing  good  inversion  understanding  and  good  calcu¬ 
lation  skills,  a  second  group  demonstrating  poor  performance  in 
both  inversion  understanding  and  calculation,  and  the  final  group 
having  good  inversion  understanding  but  poor  calculation  perfor¬ 
mance.  This  finding  suggests  that  individual  differences  in  con¬ 
ceptual  knowledge  do  not  correspond  with  arithmetic  competence 
directly. 

There  are  a  few  longitudinal  studies  that  demonstrated  that 
quantitative  reasoning  predicted  children’s  later  success  in  math¬ 
ematical  achievement.  Nunes  and  colleagues  (2007)  investigated 
whether  children’s  quantitative  reasoning  measured  at  school  entry 
was  a  significant  predictor  of  mathematical  achievement  16 
months  later,  which  was  assessed  by  the  Standardized  Achieve¬ 
ment  Tasks,  Mathematics  Section.  They  found  that  quantitative 
reasoning  was  a  significant  and  specific  predictor  of  children’s 
mathematical  achievement.  The  relation  was  specific  because 
quantitative  reasoning  remained  a  significant  predictor  after  the 
effects  of  general  intelligence  and  working  memory  were  statisti¬ 
cally  controlled  for.  Nunes  and  colleagues  (2012)  conducted  an¬ 
other  longitudinal  study  to  evaluate  whether  quantitative  reasoning 
and  arithmetic  skills  are  independent  predictors  of  children’s  math¬ 
ematical  achievement  in  an  older  age  (Key  Stage  II  at  1 1  years  of 
age  and  Key  Stage  III  at  14  years  of  age).  They  found  that 
quantitative  reasoning  made  a  unique  contribution  to  the  prediction 
of  children’s  mathematical  achievement  at  1 1  and  14  years  beyond 
and  above  the  effects  of  age,  general  intelligence,  working  mem¬ 
ory,  and  arithmetic  skills. 

In  summary,  additive  reasoning  appears  to  be  important  for 
children  to  learn  mathematics.  Understanding  commutativity  and 
the  inverse  relation  between  addition  and  subtraction  are  part  of 
the  construct  of  additive  reasoning.  This  knowledge  seems  to  be 
distinct  from  and  developmentally  more  advanced  than  the  under¬ 
standing  of  ordinality  and  cardinality.  Thus,  it  is  expected  that 
individual  differences  in  additive  reasoning  would  explain  varia¬ 
tion  in  children’s  mathematical  achievements  beyond  counting 
ability  and  general  cognitive  capacities,  such  as  working  memory 
and  general  intelligence. 

The  Present  Study 

On  the  basis  of  the  literature  review,  several  research  gaps  are 
identified.  First,  from  the  mathematical  thinking  perspective,  it  is 
important  to  assess  the  conceptual  aspects  of  counting.  Learning  to 
count  matters  for  children  to  learn  mathematics  because  it  helps 
children  reflect  on  the  analytical  and  representational  meanings  of 
number.  Thus,  if  a  child  counts  without  understanding  what  she  or 
he  is  doing,  she  or  he  should  not  be  considered  as  mathematically 
competent  from  the  mathematical  thinking  perspective.  Related  to 
this  argument  is  that  procedural  tasks  alone,  such  as  counting 
number  sequences,  are  not  good  indicators  of  counting  ability 


484 


CHING  AND  NUNES 


because  they  do  not  necessarily  reflect  children’s  understanding  of 
the  logic  of  counting.  Therefore,  measures  that  capture  children’s 
knowledge  of  the  relations  of  counting  to  quantities  such  as  the 
ability  to  identify  the  cardinality  of  a  set,  should  also  be  included 
in  research  studies.  However,  to  the  best  of  our  knowledge,  most 
predictive  studies  used  procedural  counting  as  the  sole  indicator  of 
counting  ability,  which  is  a  limitation  that  needs  to  be  addressed  in 
future  studies. 

Second,  the  relation  between  additive  reasoning  (as  measured  by 
the  knowledge  of  commutativity  and  the  inverse  relation  between 
addition  and  subtraction)  and  mathematical  achievement  remains 
unclear.  There  are  mixed  findings  regarding  its  connection  with 
children’s  calculation  ability,  however,  there  seems  to  be  no  study 
that  examines  its  relation  to  children’s  ability  to  solve  different 
types  of  story  problems.  More  research  that  incorporates  both 
calculation  and  story  problem  solving  as  the  outcome  measures  are 
needed  because  cogntiive  correlates  of  mathematical  ability  may 
vary  across  mathematical  tasks  (Chong  &  Siegel,  2008;  Cowan  & 
Powell,  2014;  Hughes,  1981;  Geary,  Hoard,  Nugent,  &  Byrd- 
Craven,  2008). 

Third,  the  contribution  of  additive  reasoning  to  mathematical 
achievement  has  never  been  investigated  in  a  non-Caucasian  cul¬ 
tural  context.  Given  that  the  dominant  language  and  other  cultural 
factors  may  differ  from  one  country  to  another  (Miller  &  Stigler, 
1987;  Miller,  Smith,  Zhu,  &  Zhang,  1995;  Miura,  Kim,  Chang,  & 
Okamoto,  1988),  it  is  important  to  evaluate  the  predictive  power  of 
various  cognitive  factors  on  children’s  mathematical  achievement 
in  a  different  culture. 

Overall,  this  study  aims  to  test  whether  working  memory,  count¬ 
ing  ability,  and  additive  reasoning  contributes  to  mathematical 
achievement  in  children  of  around  6  years  of  age.  In  summary,  the 
above  analyses  lead  to  the  following  hypotheses: 

Hypothesis  1:  Counting  ability  is  important  in  children’s 
mathematics  learning  and  its  influence  is  independent  from 
that  of  general  cognitive  capacities,  such  as  working  memory. 

Hypothesis  2:  Additive  reasoning  (as  assessed  by  knowledge 
of  commutativity  and  the  complement  principle)  is  indepen¬ 
dent  from  and  more  important  than  counting  ability  and  gen¬ 
eral  cognitive  capacities,  such  as  working  memory  in  chil¬ 
dren’s  mathematics  learning. 

Hypothesis  3:  Working  memory,  as  a  domain-general  factor, 
makes  a  contribution  to  mathematical  achievement,  even 
when  one  accounted  for  children’s  specific  mathematical 
knowledge  such  as  their  knowledge  of  counting  and  additive 
reasoning. 

Method 

Overview  of  Research  Design 

To  address  the  hypotheses,  we  employed  a  longitudinal  design 
in  this  study.  According  to  Bradley  and  Bryant  (1983),  both 
longitudinal  and  intervention  studies  are  important  for  establishing 
a  causal  relation  between  variables.  Intervention  studies  can  be 
used  to  discern  the  causal  relation  between  certain  skills  and 
mathematical  achievement.  Through  this  type  of  design,  we  can 
find  out,  for  example,  whether  training  additive  reasoning  leads  to 


an  improvement  of  these  skills  and  mathematical  achievement.  But 
before  we  implement  an  intervention  study,  we  need  to  identify 
factors  that  are  important  for  children  s  mathematics  learning. 
Longitudinal  studies  give  us  a  good  opportunity  to  examine  the 
temporal  order  of  events.  It  is  important  to  know  whether  a 
predictor  precedes  mathematical  achievement  because  it  is  a  nec¬ 
essary  condition  for  determining  causal  relation  between  variables. 
Through  statistical  techniques,  such  as  multiple  regression  analy¬ 
sis,  we  can  identify  the  direction  and  strengths  of  associations 
between  a  predictor  and  mathematical  achievement  and  compare 
the  unique  contributions  of  each  predictor  to  variation  in  the 
outcome.  Thus,  longitudinal  study  is  considered  as  an  important 
first  step  for  developing  an  intervention. 

In  this  study,  we  used  a  longitudinal  design,  which  spanned 
around  10  months,  to  examine  whether  the  main  predictors  (work¬ 
ing  memory,  counting  ability,  and  additive  reasoning)  uniquely 
predicted  children’s  mathematical  achievement  (calculation  and 
story  problem  solving).  The  main  predictors  in  this  study  were 
working  memory,  counting  ability,  and  additive  reasoning.  Work¬ 
ing  memory  was  defined  as  children’s  performance  on  three  tasks 
including  digit  span  forward  (the  phonological  loop),  Corsi  span 
(the  visuospatial  sketchpad),  counting  recall  and  digit  span  back¬ 
ward  (the  central  executive).  Counting  ability  was  operationalized 
as  children’s  procedural  counting  skills  and  conceptual  knowledge 
of  counting,  whereas  additive  reasoning  was  operationalized  as 
children’s  understanding  of  the  commutativity  and  complement 
principles.  All  of  these  main  predictors  were  assessed  at  the  first 
wave  of  data  collection  (Time  1  [Tl]  =  during  the  first  grade  of 
the  participating  children).  A  number  of  control  variables,  such  as 
general  intelligence  and  demographic  characteristics,  were  also 
measured  at  Tl  to  ensure  that  any  observed  associations  between 
predictors  and  outcome  measures  are  not  due  to  an  extraneous 
factor  that  may  affect  the  relations. 

The  second  testing  occasion,  Time  2  (T2;  during  the  second 
grade  of  the  participating  children),  comprised  two  measures  of 
mathematical  achievement  and  a  measure  of  Chinese  word  read¬ 
ing.  To  assess  mathematical  achievement,  two  measures  including 
calculation  and  story  problem  solving  in  the  domain  of  addition 
and  subtraction  were  used  in  this  study  for  three  reasons.  First, 
both  tasks  are  commonly  assessed  in  research  and  school.  Second, 
children  at  this  age  are  expected  to  learn  addition  and  subtraction 
in  school.  Third,  some  researchers  (Chong  &  Siegel,  2008;  Cowan 
&  Powell,  2014;  Hughes,  1981;  Geary  et  al.,  2008)  have  shown 
that  cognitive  correlates  of  mathematical  ability  may  vary  across 
mathematical  tasks,  which  supports  the  rationale  to  examine  mul¬ 
tiple  indicators  of  mathematical  ability  in  this  study.  Mathematical 
achievement  was  assessed  at  both  Tl  and  T2.  Assessing  mathe¬ 
matical  achievement  at  both  time  points  provides  an  opportunity  to 
test  whether  the  main  predictors  would  predict  mathematical 
achievement  concurrently  (Tl)  and  longitudinally  (T2).  It  also 
enables  us  to  assess  whether  these  predictors  remain  significant 
predictors  of  T2  mathematical  achievement  after  the  effects  of  Tl 
mathematical  achievement  was  statistically  controlled  for. 

Chinese  word  reading  was  included  as  the  outcome  control 
measure  to  test  the  specificity  of  certain  variables  on  mathematical 
performance.  If  additive  reasoning  is  specifically  relevant  to  math¬ 
ematics  learning,  children’s  performance  on  this  task  should  pre¬ 
dict  much  better  for  their  success  in  mathematical  tasks  than 
nonmathematical  tasks,  such  as  Chinese  word  reading.  By  contrast, 


ADDITIVE  REASONING  AND  MATHEMATICAL  ACHIEVEMENT 


485 


general  cognitive  ability  should  correlate  with  both  mathematical 
and  nonmathematical  tasks  because  all  of  these  tasks  demand 
cognitive  resources,  such  as  working  memory  and  general  intelli¬ 
gence.  This  kind  of  design  has  been  adopted  in  some  longitudinal 
research  of  children’s  reading  (e.g„  Bradley  &  Bryant,  1983)  but 
it  is  rare  in  studies  that  address  children’s  mathematics  learning 
(e.g.,  Nunes  et  ah,  2012). 

The  independent  contributions  of  each  predictor  measured  at  T1 
to  mathematical  achievement  at  the  second  testing  occasion  was 
assessed  by  taking  into  account  the  effects  of  age,  nonverbal 
intelligence  and  demographic  factors.  For  each  child,  the  interval 
between  the  first  and  second  wave  of  assessments  was  between  9 
and  11  months,  with  10  months  being  the  commonest  interval 
(83%). 

Participants 

One  hundred  fifteen  children  (61  boys,  54  girls)  studying  in 
three  primary  schools  in  Hong  Kong  participated  in  both  waves  of 
assessments  in  this  longitudinal  study.  All  of  these  children  spoke 
Cantonese  and  attended  the  first  year  of  primary  school,  with  a 
mean  age  of  76.32  months  ( SD  =  2.81  months,  ranging  from  67.8 
to  82. 1  month),  during  the  first  wave  of  assessment.  The  mean  age 
of  the  children  during  the  second  wave  of  assessment  was  86.34 
months  (SD  =  2.81  months,  ranging  from  77.8  to  92.1  month).  All 
of  the  children  were  reported  to  have  intelligence  within  the  range 
accepted  as  normal  for  their  ages,  and  did  not  have  learning 
difficulties  or  emotional/behavioral  problems,  such  as  dyslexia, 
specific  language  impairments,  attention  deficits  and  hyperactivity 
disorders,  or  any  neurological  disorders. 

On  the  basis  of  previous  studies  (e.g.,  Canobi  et  al.,  2003; 
Gilmore  &  Bryant,  2006;  Nunes  et  al.,  2007,  2012),  an  a  priori 
power  analysis  (Cohen,  1988;  GPower  3.1;  Faul,  Erdfelder,  Lang, 
&  Buchner,  2007)  indicated  that  a  minimum  sample  size  of  91  was 
needed  to  detect  a  medium  effect  size  (using  Cohen’s,  1988, 
criteria)  with  an  alpha  of  .05  and  power  of  .80  using  multiple 
regression  analyses,  therefore  the  current  sample  size  was  consid¬ 
ered  sufficient. 

The  highest  educational  levels  attained  by  the  mothers  of  the 
children  in  the  sample  were  as  follows:  No  schooling/preprimary 
school  level  —  5.2%,  primary  school  graduates  =  20.8%,  second¬ 
ary  school  graduates  =  57.4%,  and  university  graduates  =  16.5%. 
According  to  the  Hong  Kong  Population  Census  (Census  and 
Statistics  Department,  Hong  Kong  Government,  2011),  the  distri¬ 
bution  of  educational  attainment  (highest  level  attained)  was:  No 
schooling/preprimary  school  level  =  10%,  primary  school  gradu¬ 
ates  =  19.2%,  secondary  school  graduates  =  46.6%,  and  univer¬ 
sity  graduates  =  24.1%.  Thus,  the  relative  distribution  of  educa¬ 
tional  levels  was  similar  to  that  of  the  overall  Hong  Kong 
population,  in  which  the  majority  of  the  population  was  secondary 
school  graduates  whereas  a  small  proportion  received  no  schooling 
or  had  preprimary  educational  level. 

Measures 

Working  memory.  Working  memory  was  assessed  with  four 
tasks,  including  (a)  digit  span  forward  (the  phonological  loop),  (b) 
digit  span  backward  (the  central  executive),  (c)  counting  recall  (the 
central  executive),  and  (d)  Corsi  blocks  (the  visuospatial  sketch¬ 


pad).  There  were  two  tasks  for  the  central  executive  because 
previous  research  showed  that  measures  of  the  central  executive 
were  stronger  predictors  of  children’s  mathematical  performance 
than  other  working  memory  measures  (e.g.,  Cowan  &  Powell, 
2014;  Gathercole  &  Pickering,  2000;  Holmes  &  Adams,  2006; 
Keeler  &  Swanson,  2001;  Lee  et  al.,  2004;  Lehto,  1995;  Noel  et 
al.,  2004;  Swanson  &  Beebe-Frankenberger,  2004;  Wilson  & 
Swanson,  2001). 

The  working  memory  task  refers  to  the  Working  Memory  Test 
Battery  for  Children  (Pickering  &  Gathercole,  2001).  In  the  digit 
span  forward  task,  children  listened  to  a  series  of  single-digit 
numbers  and  were  asked  to  repeat  the  numbers  in  the  correct  order. 
All  digits  were  presented  at  a  rate  of  one  per  second.  The  series  of 
numbers  initially  consisted  of  two  numbers,  and  increased  by  one 
number  after  every  other  presentation,  to  a  maximum  of  nine. 
Children  were  given  one  point  for  each  sequence  correctly  re¬ 
called.  The  maximum  possible  score  for  this  task  was  16.  The 
internal  consistency  of  this  task  was  satisfactory  (Cronbach’s  a  = 
.81). 

The  digit  span  backward  was  similar  to  the  digit  span  forward, 
except  that  the  children  were  asked  to  recite  the  numbers  back¬ 
ward.  In  counting  recall,  children  were  asked  to  count  the  triangles 
in  a  series  of  shape  arrays  and  then  to  recall  the  total  number  of 
triangles  in  each  series.  The  number  of  arrays  started  from  two  and 
increased  by  one  array  after  every  other  presentation  to  a  maxi¬ 
mum  of  nine.  The  total  number  of  correct  trials  was  used  as  an 
indicator  of  participants’  performance  on  these  tasks.  The  maxi¬ 
mum  possible  score  for  this  task  was  16.  The  internal  consistency 
of  this  task  was  satisfactory  (Cronbach’s  a  =  .89).  The  listening 
recall  task  was  not  used  in  this  study  because  it  cannot  be  simply 
translated  into  Cantonese  without  a  proper  investigation  of  how  it 
works  in  this  language. 

The  Corsi  block  task  involved  nine  blocks  and  the  experimenter 
tapped  a  sequence  of  blocks  at  a  rate  of  one  per  second.  Then, 
children  were  asked  to  replicate  the  sequence.  The  sequence  in¬ 
volved  two  blocks  initially  and  increased  by  one  block  every  other 
presentation,  to  a  maximum  of  nine.  The  maximum  possible  score 
for  this  task  was  16.  The  internal  consistency  of  this  task  was 
satisfactory  (Cronbach’s  a  =  .83).  For  each  of  the  above  tasks, 
there  were  two  trials  for  each  span  length  and  testing  was  termi¬ 
nated  when  a  child  failed  two  trials  of  the  same  length.  In  each 
task,  two  practice  items  were  given  to  the  children  and  no  feedback 
was  given  to  the  children  in  any  of  the  testing  trials. 

Counting  ability.  Counting  ability  was  operationalised  as  (a) 
children’s  ability  to  count  with  accuracy  (procedural  counting)  and 
(b)  their  ability  to  recognize  the  counting  principles  and  the  coor¬ 
dinated  use  of  various  counting  principles  (conceptual  knowledge 
of  counting). 

Procedural  counting  was  assessed  with  two  tasks:  oral  rote 
counting  and  object  counting.  In  oral  rote  counting,  children 
counted  some  numerical  sequences  verbally  in  ascending  and 
descending  orders.  They  were  first  asked  to  count  from  5  to  16  as 
a  practice  trial.  There  were  then  eight  testing  trials  in  which 
children  were  asked  to  count  a  set  of  numbers  in  ascending  orders 
(e.g.,  25  to  32,  56  to  63,  76  to  81,  118  to  123)  and  in  descending 
orders  (e.g.,  46  to  38,  73  to  65,  34  to  27,  121  to  115).  Testing 
within  a  set  was  discontinued  when  a  child  had  committed  errors 
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on  two  sequences  in  a  set.  Children  received  one  point  for  each 
sequence  completed  correctly. 

Another  task,  object  counting,  was  also  included  as  one  of  the 
measures  of  procedural  counting  to  test  whether  the  children  could 
count  correctly  using  one-to-one  correspondence  between  words 
and  objects.  In  object  counting,  they  were  required  to  count  two 
trials  of  geometric  shapes  (e.g.,  circles,  squares)  and  two  trials  of 
recognizable  objects  (e.g.,  pens,  rubber).  The  numbers  of  objects 
were  six,  nine,  13,  and  15  for  rubber,  pens,  squares,  and  circles, 
respectively.  On  any  given  trials,  the  objects  were  identical  in 
appearance.  The  order  of  task  presentations  was  counterbalanced 
across  participants.  Children  received  one  point  for  each  correct 
counting.  The  total  scores  for  procedural  knowledge  of  counting 
for  each  child  was  the  sum  of  his  or  her  performance  on  the  oral 
rote  and  object  counting  tasks.  The  maximum  possible  score  was 
12.  The  internal  consistency  of  this  procedural  counting  task  was 
satisfactory  (Cronbach’s  a  =  ,.71). 

Conceptual  knowledge  of  counting  was  assessed  with  a  count¬ 
ing  judgment  task  adapted  from  previous  work  (e.g.,  Briars  & 
Siegler,  1984;  Freeman  et  al.,  2000;  LeFevre  et  ah,  2006).  As 
shown  in  Table  1,  three  types  of  trials  were  used:  (a)  correct  counts 
(four  trials),  (b)  incorrect  counts  (six  trials),  and  (c)  correct  but 
unusual  counts  (six  trials).  Thus,  children  evaluated  a  total  of  16 
counts.  On  each  trial,  a  set  of  objects  ranging  in  number  from  six 
to  12  was  shown  to  children.  In  this  task,  a  puppet  “Pika”  was 
introduced,  and  the  researcher  (the  author)  explained  to  the  chil¬ 
dren  that  Pika  was  just  learning  to  count. 

The  same  protocol  designed  by  previous  researchers  (e.g.,  Free¬ 
man  et  ah,  2000)  was  used: 

This  is  Pika  and  he  would  like  you  to  play  a  counting  game  with  him. 
He  is  going  to  count  the  things  on  the  table.  But  he  is  just  learning  to 
count,  and  sometimes  he  makes  mistakes.  Sometimes  he  counts  in 
ways  that  are  okay,  but  sometimes  he  counts  in  ways  that  are  not  okay 
and  he  was  wrong.  Watch  carefully  while  he  counts.  When  he  has 
finished  counting,  you  tell  me  if  he  counted  okay  or  not  okay. 

Items  were  put  in  a  row  at  1-cm  intervals.  Pika  faced  each  child 
and  always  counted  one  item  per  second  from  the  child’s  left  to 
right  except  on  the  reverse  direction  trials.  After  each  trial,  chil¬ 
dren  were  asked  whether  the  count  was  ok  or  not  okay  (“error 
detection”).  Then,  they  were  given  10  seconds  to  answer  the 
question  that  tested  their  understanding  of  cardinality:  “How  many 


things  are  there  in  total?”  The  maximum  possible  score  for  both 
“error  detection”  and  “cardinality  was  16.  The  internal  consis¬ 
tency  of  the  entire  conceptual  knowledge  of  counting  measure  was 
satisfactory  (Cronbach’s  a  =  .85). 

Additive  reasoning  (the  commutativity  and  complement 
principles).  Additive  reasoning  was  operationalized  as  chil¬ 
dren’s  understanding  of  the  commutativity  and  complement  prin¬ 
ciples.  The  commutativity  principle  refers  to  the  irrelevance  of 
addend  order  to  the  sum,  that  is,  “a  +  b  =  c”  implies  “b  +  a  =  c,” 
whereas  the  complement  principle  refers  to  the  inverse  relation 
between  addition  and  subtraction,  that  is,  “a  +  b  =  c”  implies  “c  — 
a  =  b.” 

This  study  adapted  a  similar  conceptual  task  used  by  Canobi  et 
al.  (2003)  in  which  children  were  tested  whether  they  could  rec¬ 
ognize  conceptual  relations  between  pairs  of  addition/subtraction 
problems.  In  general,  children  were  shown  a  puppet  that  was  going 
to  solve  two  problems,  namely  base  and  target  problems.  The 
puppet  “solved”  the  base  problem  by  counting  very  quickly  and 
told  the  answer  to  the  researcher,  who  then  told  the  children  that 
the  answer  was  correct.  After  that,  the  children  were  shown  a 
target  problem  and  were  asked  to  determine  whether  the  puppet 
needed  to  count  again  to  solve  the  problem  or  whether  the  puppet 
could  find  out  the  answer  by  “looking  back”  at  the  base  problem. 

All  of  the  problems  were  presented  as  story  problems  that 
involved  a  change  in  quantity  (e.g.,  “Mary  has  three  fish  and  her 
mother  gave  her  five  more”).  Change  problems  were  used  instead 
of  combine  problems  because  children’s  performance  on  the  com¬ 
bine  problems  reached  ceiling  in  the  pilot  test.  The  number  of 
words  in  each  problem  did  not  vary  considerably.  The  researcher 
presented  each  child  with  a  written  version  of  the  problem  as  it  was 
read  and  kept  it  in  front  of  the  child  until  the  problem  was  solved. 
For  example,  after  showing  the  base  and  target  problems  that  were 
printed  on  two  separate  cards,  the  researcher  asked, 

Now  look  at  these  two  problems.  If  we  gave  Pika  (the  puppet)  this 
problem  next  (pointing  to  the  target  problem),  do  you  think  Pika 
would  need  to  count  to  work  out  the  answer  or  could  Pika  look  back 
at  the  problem  he  has  already  done  (pointing  to  the  base  problem)? 

The  conceptual  judgment  task  involved  two  parts:  a  “testing 
session”  immediately  after  a  “warm-up  session.”  In  the  “warm-up 
session,”  the  children  were  given  six  practice  problems  to  famil- 


Table  1 


Counting  Judgment  Task  ( Adapted  From  LeFevre  et  al.,  2006) 


Type  of  trial 

Trial 

N 

Description 

Correct  counts 

Incorrect  counts 

1,  4,  13,  15 

6,  8,  9,  12 

Conventional  left-to-right  count 

Violations  of  word-object  correspondence 

Repeated  words 

2,  11 

7,  10 

Pika  used  an  incorrect  (repeated)  number  word  that  did  not  correspond  to  an  item  (i.e.,  one, 
two,  two) 

Skipped  object 

7,  16 

6,  11 

Pika  missed  counting  an  item  in  the  regular  sequence  and  never  returned  to  it 

Double  count 

Unusual  counts 

8,  14 

12,  8 

Pika  counted  one  item  twice 

Violations  of  conventional  features 

Reverse  direction 

3,  12 

7,  12 

Pika  counted  from  the  right  to  the  left 

Start  in  the  middle 

5,9 

11,  9 

Pika  started  counting  in  the  middle  of  the  set,  counted  to  the  right  end,  and  then  went  back 
to  the  beginning  to  finish 

Double  point 

6,  10 

9,  10 

Pika  hopped  twice  on  an  item  but  repeated  the  correct  number  word  twice  (e.g.,  eight  eight) 

Note.  Trial  =  position  in  the  order  of  presentation;  N  =  number  of  items  for  each  trial. 
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iarize  with  the  procedure.  Half  of  the  practice  problems  were 
identical  (e.g.,  base:  4  +  4  and  target:  4  +  4)  and  half  of  them  were 
different  (e.g.,  base:  4  +  3  and  target:  6  +  7).  Children  were  given 
feedback  on  whether  they  were  correct  in  judging  the  same/ 
different  relation  between  the  target  and  base  problems.  The  an¬ 
swers  were  100%  correct  for  all  participants  in  this  session,  indi¬ 
cating  that  they  understood  the  task  instructions. 

In  the  “testing  session,”  the  researcher  showed  six  target  prob¬ 
lems  in  random  order  after  asking  the  puppet  to  solve  the  base 
problem  (e.g.,  “Mary  has  three  fish  and  her  mother  gave  her  five 
more”).  The  target  problems  included  (a)  an  identity  problem, 
which  was  identical  with  the  target  problem  (e.g.,  “Mary  has  three 
fish  and  her  mother  gave  her  five  more”)  and  (b)  a  different 
problem,  which  was  completely  unrelated  to  the  target  problem 
(e.g.,  “Mary  has  seven  fish  and  her  mother  gave  her  two  more”). 
The  identity  and  different  problems  were  designed  to  detect  pos¬ 
sible  responses  biases,  which  may  involve  inattention,  difficulty  in 
understanding  the  procedure,  and  random  responses.  The  accuracy 
rates  for  all  the  identity  and  different  problems  were  100%. 

To  assess  children’s  knowledge  in  each  of  the  additive  reason¬ 
ing  principles,  two  types  of  items  were  used:  test  items  and  control 
items.  Examples  of  these  items  are  presented  in  Table  2. 

The  test  items  were  designed  to  assess  children’s  understanding 
of  a  particular  principle.  They  included  (a)  commutativity  test 
items,  which  were  related  to  the  corresponding  base  problems  on 
the  basis  of  the  commutativity  principle  (e.g.,  5  +  3)  and  (b) 
complement  test  items,  which  were  related  to  the  corresponding 
base  problems  according  to  the  complement  principle  (e.g.,  8  —  5). 
Half  of  the  problems  had  sums  less  than  10  (small  number)  and 
half  of  them  had  sums  between  15  and  25  (large  number). 

Control  items  were  included  to  detect  whether  children  answer 
the  question  correctly  because  of  biases.  For  example,  children 
may  answer  that  “3  +  5  =  8”  is  helpful  for  solving  “5  +  3” 
correctly  just  because  they  realize  that  two  numbers  in  the  base 
problem  (i.e.,  3  and  5)  are  present  in  the  target  problem  (5  +  3). 
These  children  may  not  understand  the  commutativity  principle 
but  simply  have  a  response  bias  to  say  “yes”  when  the  numbers  are 
the  same.  Children  with  such  a  response  bias  would  also  answer 
that  “3  +  5  =  8”  helps  to  solve  the  question  “5  -  3.”  Thus,  we 


included  two  types  of  control  item  in  this  study:  (a)  commutativity 
controls,  which  involved  subtraction  items  that  evaluated  whether 
the  children  did  not  simply  ignore  the  operation  to  make  a  judg¬ 
ment  (e.g.,  5  —  3),  and  (b)  complement  controls,  which  involved 
addition  problems  that  comprised  the  sum  and  one  term  of  the  base 
problem  added  together  (e.g.,  8  +  5). 

Thus,  the  control  items  did  not  serve  to  measure  the  constructs, 
but  they  were  there  to  allow  for  a  correction  for  response  biases.  A 
child  was  only  credited  one  point  if  they  answered  both  the  test  and 
the  control  items  correctly.  There  were  6  commutativity  items  and 
six  control  items  for  commutativity;  if  the  child  passed  one  com¬ 
mutativity  item  and  its  control,  the  child  was  awarded  one  point; 
otherwise,  no  points  were  awarded.  Similarly,  there  were  six 
complement  items  and  six  control  items  for  the  complement  prin¬ 
ciple;  if  the  child  passed  one  complement  item  and  its  control,  the 
child  was  awarded  one  point;  otherwise,  no  points  were  awarded. 
The  internal  consistencies  of  the  additive  reasoning  measures  were 
satisfactory  (commutativity:  Cronbach’s  a  =  .81;  complement 
principle:  Cronbach’s  a  =  .85). 

General  intelligence.  Children’s  general  intelligence  was 
measured  with  Raven’s  standard  progressive  matrices  (Raven, 
Raven,  &  Court,  2003)  at  Tl.  This  test  was  considered  because  it 
has  been  a  robust  measure  of  nonverbal  aspect  of  intelligence  and 
has  been  used  widely  in  previous  research.  It  is  a  standardized  test 
including  five  sets  of  12  items  each.  Each  item  involves  a  target 
matrix  with  a  missing  piece.  Children  were  asked  to  choose,  from 
six  or  eight  alternatives,  the  best  figure  to  complete  the  target 
matrix.  One  mark  was  given  for  the  correct  answer  for  each  item. 

Demographic  characteristics.  Other  control  variables  in¬ 
cluded  demographic  information  reported  by  parents  in  a  question¬ 
naire,  namely  children’s  sex  and  mothers’  highest  education  level 
at  Tl. 

Mathematical  achievement:  Calculation.  Mathematics 
achievement  was  measured  by  children’s  performance  on  16 
simple  calculation  tasks  and  32  story  problems,  all  of  which  were 
designed  with  reference  to  the  curriculum  guide  developed  by  the 
Hong  Kong  Education  Bureau.  Thirty  items  (15  addition  and  15 
subtraction)  were  constructed  and  tested  in  the  pilot.  On  the  basis 


Table  2 

Types  of  Target  Problems,  Examples,  and  Their  Purpose 

Base  problem  Types  of  corresponding  target  problems  Purpose  of  the  target  problems 


Mary  has  3  fish  and  her  mother  Commutativity  test  item:  Mary  has  5  fish 
gave  her  5  more.  How  many  and  her  mother  gave  her  3  more.  How 

fish  does  Mary  have  now?  many  fish  does  Mary  have  now? 

(Answer:  3  +  5  =  8) 

Commutativity  control  item:  Mary  has  5  fish 
and  her  mother  took  away  3  from  her. 
How  many  fish  does  Mary  have  now? 

Complement  test  item:  Mary  has  8  fish  and 
her  mother  took  away  5  from  her.  How 
many  fish  does  Mary  have  now? 

Complement  control  item:  Mary  has  8  fish 
and  her  mother  gave  5  more  to  her.  How 
many  fish  does  Mary  have  now? 


To  test  children’s  understanding  of  the  commutativity  principle. 
The  base  problem  should  be  helpful  to  solve  this  item 
because  the  answer  of  “5  +  3”  can  be  deduced  by  “3  +  5  = 
8”  according  to  the  commutativity  principle. 

To  allow  for  a  correction  for  response  biases.  The  base 
problem  should  not  be  helpful  to  solve  this  control  item 
because  the  answer  of  “5  —  3”  cannot  be  deduced  by  “3  + 

5  =  8”  according  to  the  commutativity  principle. 

To  test  children’s  understanding  of  the  complement  principle. 
The  base  problem  should  be  helpful  to  solve  this  item 
because  the  answer  of  “8  —  5”  can  be  deduced  by  “3  +  5  = 
8”  according  to  the  complement  principle. 

To  allow  for  a  correction  for  response  biases.  The  base 
problem  should  not  be  helpful  to  solve  this  control  item 
because  the  answer  of  “8  +  5”  cannot  be  deduced  by  “3  + 

5  =  8”  according  to  the  complement  principle. 
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of  pilot  findings,  16  items  (eight  addition  and  eight  subtraction) 
were  selected  for  each  wave  of  data  collection  in  the  main  study. 
Of  these  16  items,  four  are  considered  as  “easy”  (average  correct 
rate:  70-100%),  six  are  “moderate”  in  difficulty  (average  correct 
rate:  40-70%),  and  six  are  “difficult”  items  (average  correct  rate: 
0-40%).  At  Tl,  children  were  orally  presented  with  addition  and 
subtraction  combinations  that  involved  eight  additions  of  numbers 
up  to  25  and  eight  subtractions  from  numbers  less  than  25  (i.e.,  6  + 
7,  3  +  8,  2  +  6,  9  +  16,  7  +  4,  2  +  16,  14  +  4,  1 1  +  7,  7  -  5, 
9  -  6,  6  -  4,  12  -  3,  21  -  16,  22  -  18,  25  -  6,  18  -  5).  At  T2, 
children  were  orally  presented  with  10  addition  and  subtraction 
problems  with  large  numbers  (24  +  4,  8  +  19,  7  +  23,  21  +  5,  9  + 
19,  28  -  9,  31  -  8.  27  -  5,  28  -  19,  26  -  8),  three  single-digit 
problems  with  three  addends  (3  +  9  +  2,  7  +  2  +  4,  8  +  5  +  2), 
and  three  single-digit  problems  with  three  subtrahends  (8  —  4  —  3, 
13  -  3  -  8,  15  —  7  -  5).  A  printed  version  of  each  calculation 
problem  was  presented  as  each  problem  was  read  and  kept  in  full 
view  of  the  child  during  problem  solving.  Feedback  was  not 
provided  and  no  time  limit  was  set.  The  maximum  possible  score 
for  calculation  was  16.  The  measures  appeared  to  have  good 
internal  consistency  (Tl:  a  =  .87;  T2:  a  —  .92). 

Mathematical  achievement:  Story  problem  solving.  Similarly, 
eight  types  of  word  problems  were  tested  from  the  same  pilot  study 
as  in  calculation.  Thirty-two  problems  were  chosen  in  the  main 
study:  On  the  basis  of  Riley,  Greeno,  and  Heller’s  (1983)  classi¬ 
fication  of  story  problems,  Tl  assessment  included  four  result 
unknown  change  problems,  four  start  unknown  change  problems, 
four  change  unknown  change  problems,  four  unknown  difference 
set  compare  problems,  four  unknown  compare  set  compare  prob¬ 
lems,  four  unknown  reference  set  compare  problems,  four  different 
unknown  combine  problems,  and  four  equalize  problems.  At  T2, 
the  number  of  each  type  of  problems  was  the  same,  except  that  two 
combine  problems  were  replaced  by  two  more  difficult  “decom¬ 
bine  transformations  problems”  (e.g.,  “John  played  two  games  of 
marbles.  In  the  second  game  he  lost  seven  marbles.  His  final  result, 
with  the  two  games  together,  was  that  he  had  won  three  marbles. 
What  happened  in  the  first  game?”).  For  each  type  of  problems, 
half  of  them  (i.e.,  two  for  each  type)  involved  small  numbers 
(sum  <  10),  whereas  half  of  them  involved  larger  numbers  (10  < 
sum  >20).  To  reduce  the  working  memory  demands  of  the  task, 
the  experimenter  presented  the  each  child  with  a  written  version  of 
the  story  problem  as  it  was  read  and  kept  it  in  front  of  the  child 
until  the  problem  was  solved.  In  this  way,  children  were  easier  to 
keep  track  of  the  contents  and  to  make  relevant  judgments  accord¬ 
ingly.  The  maximum  possible  score  for  story  problem  solving  was 
32.  The  measures  appeared  to  have  good  internal  consistency  (Tl: 
a  =  .92;  T2:  a  =  .91). 

Chinese  word  reading.  On  the  basis  of  the  Hong  Kong  Lex¬ 
ical  Lists  for  Primary  Learning  (Hong  Kong  Education  Bureau, 
2014),  a  word  recognition  task  was  constructed.  According  to  the 
corpus,  there  are  4,914  words  in  Key  Stage  I  (Grades  1  to  3).  Fifty 
words  from  this  corpus  were  chosen  for  the  pilot  test.  On  the  basis 
of  the  pilot  findings,  30  words  were  included  in  the  word  recog¬ 
nition  task  in  the  main  study.  Of  these  30  items,  eight  were  easy 
items  (average  correct  rate:  70-100%),  12  had  moderate  difficulty 
(average  correct  rate:  40-70%),  and  10  were  difficult  (average 
correct  rate:  0-40%).  The  items  were  arranged  from  the  easiest 
words  at  the  beginning  to  the  most  difficult  ones  toward  the  end  of 


the  test.  In  this  task,  children  were  shown  written  two-character 
Chinese  words  and  asked  to  read  aloud.  One  point  was  given  for 
each  correct  response.  No  feedback  was  given.  The  maximum 
possible  score  for  this  task  was  30.  The  internal  consistency  of  this 
measure  was  satisfactory  (Cronbach’s  a  =  .93). 

Procedure 

This  study  was  approved  by  a  research  ethics  committee  of  the 
university.  Participating  children  were  recruited  through  local 
schools  and  nonprofit  child-related  community  centers  in  Hong 
Kong.  Parents  were  informed  of  the  study  via  letters  sent  home  by 
teachers  or/and  administrators.  Upon  receipt  of  parental  consent, 
the  children  were  asked  for  verbal  assent  and  participated  individ¬ 
ually  with  the  author  in  a  quiet  location,  which  was  separate  from 
other  children  in  the  primary  school  or  center.  At  Tl  (first  grade), 
the  children  were  tested  in  two  30-  to  40-min  sessions  separated  by 
approximately  1  week.  For  all  children,  order  of  task  presentation 
was  the  same.  The  first  session  included  Raven’s  standard  pro¬ 
gressive  matrices,  the  central  executive,  phonological  loop,  and 
visuospatial  sketchpad  tasks,  as  well  as  the  tasks  that  assessed 
children’s  knowledge  of  the  commutativity  and  complement  prin¬ 
ciples.  The  second  session  involved  tasks  that  assessed  procedural 
and  conceptual  knowledge  of  counting,  calculation  and  story  prob¬ 
lem  solving.  At  T2  (second  grade),  the  children  were  tested  in  one 
session  that  lasted  for  approximately  20  to  30  min  in  which  the 
Chinese  word  reading,  calculation,  and  story  problem  solving  tasks 
were  administered.  For  each  child,  the  interval  between  the  first 
and  second  wave  of  assessments  was  between  9  and  1 1  months, 
with  10  months  being  the  commonest  interval  (83%).  For  all 
children,  testing  was  conducted  by  a  researcher  in  Cantonese 
during  the  day. 

Results 

Preliminary  Analyses 

Descriptive  statistics.  Table  3  shows  the  descriptive  statistics 
for  each  variable.  Of  particular  concern  is  whether  the  scores  of  the 
outcome  measures  of  mathematical  achievement  are  normally  dis¬ 
tributed.  It  is  necessary  to  examine  whether  the  normality  assump¬ 
tion  of  regression  analysis  is  violated  to  evaluate  whether  regres¬ 
sion  is  an  appropriate  statistical  tool  to  address  the  hypotheses  of 
this  study  (Cohen  &  Cohen,  1983).  Thus,  the  distributions  of 
children’s  scores  on  calculation  and  story  problem  solving  were 
analyzed  with  regard  to  the  z  values  of  skewness  and  kurtosis  of 
each  outcome  variable.  The  z  value  of  skewness  was  calculated  by 
dividing  the  skewness  value  by  its  standard  error,  whereas  z  value 
of  kurtosis  was  calculated  by  dividing  the  kurtosis  value  by  its 
standard  error.  Table  3  shows  that  none  of  the  z  values  are  higher 
than  1 .96,  suggesting  that  the  scores  do  not  violate  the  normality 
assumption. 

Examining  the  influence  of  demographic  variables. 

Demographic  variables  may  explain  differences  in  children’s 
scores  on  the  main  predictors  (working  memory,  counting  ability, 
and  additive  reasoning)  as  well  as  the  scores  in  mathematical 
achievement.  To  assess  the  effects  of  demographic  variables  on 
mathematical  achievement,  we  conducted  independent  t  tests  with 
children  s  sex  and  one-way  analyses  of  variance  with  mothers’ 
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Table  3 


Descriptive  Statistics  for  Domain-General  Factors,  Counting  Ability,  Additive  Reasoning,  and  Mathematical  Achievement  (N  =  115) 


Variables 

Reliability 
(a  coefficients) 

Possible  range 

M 

SD 

Minimum 

Maximum 

Skewness  z 
value 

Kurtosis  z 
value 

Domain-general  factors 

Age  in  months 

NA 

NA 

76.32 

2.81 

67.8 

82.1 

-1.87 

-.34 

Nonverbal  intelligence:  Raven’s 

raw  scores 

.81 

0-60 

19.45 

2.94 

15 

26 

1.52 

-1.81 

Working  memory:  central 

executive 

Digit  span  backward 

.89 

0-16 

7.51 

1.88 

4 

10 

-.01 

-1.12 

Counting  recall 

0-16 

7.93 

1.61 

6 

10 

.06 

-1.45 

Working  memory:  phonological 

loop 

Digit  span  forward 

.81 

0-16 

11.37 

2.3 

8 

16 

.5 

-2.2 

Working  memory:  visuospatial 

sketchpad 

Corsi  span 

.83 

0-16 

10.47 

2.01 

6 

16 

.65 

-1.1 

Counting  ability 

Procedural  counting 

.71 

0-12 

11.02 

1.08 

9 

12 

-3.39 

-1.6 

Conceptual  knowledge  of  counting 
Error  detection 

.85 

0-16 

13.37 

1.91 

10 

16 

-.4 

-.66 

Cardinality 

0-16 

14.23 

1.14 

12 

16 

-.13 

-.74 

Additive  reasoning 

Commutativity  principle 

.81 

0-6 

4.14 

1.39 

1 

6 

-3.57 

.32 

Complement  principle 

.85 

0-6 

2.05 

1.54 

0 

5 

1.8 

-2.11 

Mathematical  achievement 

Time  1  calculation 

.87 

0-16 

11.03 

2.85 

5 

16 

-.09 

-1.73 

Time  2  calculation 

.92 

0-16 

10.95 

3.04 

5 

16 

.39 

-1.85 

Time  1  story  problem  solving 

.92 

0-32 

22.38 

4.43 

14 

32 

1.21 

-1.13 

Time  2  story  problem  solving 

.91 

0-32 

23.63 

3.57 

16 

30 

-1.04 

-1.17 

Note.  NA  =  not  applicable. 


educational  level  and  school  separately  as  a  fixed  factor  for  each 
predictor  and  each  measure  of  mathematical  achievement.  All 
variables  showed  no  evidence  of  significant  influence  of  children’s 
sex,  mothers’  educational  level,  and  school.  Intraclass  correlations 
were  also  calculated  according  to  Cohen,  Cohen,  West,  and  Aiken 
(2003)  to  examine  whether  there  was  any  evidence  of  clustering. 
All  intraclass  correlations  ranged  from  0.01  to  0.07,  which  were 
close  to  0.  The  very  low  within-cluster  correlations  suggest  that 
there  is  no  clustering  in  the  present  data.  Thus,  these  variables 
were  not  included  in  the  regression  analyses. 

Associations  between  variables.  In  this  study,  composite 
scores  of  some  variables  were  created  on  the  basis  of  theoretical 
and  empirical  reasons  (previous  research  and  the  correlations  be¬ 
tween  variables  in  the  present  data).  According  to  Baddeley  (Bad- 
deley  &  Hitch,  1974;  Baddeley,  1992),  working  memory  consists 
of  three  components:  the  central  executive,  phonological  loop,  and 
visuospatial  sketchpad.  The  central  executive  has  been  commonly 
measured  by  two  tasks  in  previous  studies:  digit  span  backward 
and  counting  recall.  The  correlation  between  the  scores  in  these 
tasks  in  the  present  sample  was  also  significant  (r  =  .43).  Accord¬ 
ing  to  Cohen  (1988),  this  correlation  value  indicates  a  moderate 
effect  (.30  <  r  <  .50).  Thus,  a  composite  score  was  formed  for 
central  executive  by  averaging  the  standardized  scores  of  the 
constituent  measures.  In  subsequent  analyses,  the  three  compo¬ 
nents  of  working  memory  were  considered  separately  because  of 
two  reasons:  First,  in  theory  (Baddeley  &  Hitch,  1974;  Baddeley, 
1992),  the  central  executive,  phonological  loop,  and  visuospatial 
sketchpad  are  three  related  but  separate  components  in  working 


memory.  Second,  most  researchers  have  treated  them  as  three 
distinct  factors  in  previous  studies  (e.g.,  Gathercole  &  Pickering, 
2000;  Holmes  &  Adams,  2006;  Keeler  &  Swanson,  2001;  Lee  et 
al.,  2004;  Lehto,  1995;  Noel  et  ah,  2004;  Swanson  &  Beebe- 
Frankenberger,  2004;  Wilson  &  Swanson,  2001). 

Three  tasks  were  used  to  measure  children’s  counting  ability: 
procedural  counting,  counting  error  detection,  and  cardinality  un¬ 
derstanding.  Theoretically,  the  latter  two  tasks  explicitly  measure 
children’s  understanding  of  the  counting  principles,  and  in  the 
present  study,  the  correlation  between  the  scores  in  these  two  tasks 
was  moderate  and  significant  (r  =  .39).  Therefore,  a  composite 
score  that  represented  “conceptual  knowledge  of  counting”  was 
formed  by  averaging  the  standardized  scores  of  these  measures. 
Although  the  scores  of  procedural  counting  strongly  correlated 
with  the  composite  score  of  conceptual  knowledge  of  counting 
(r  =  .62;  Cohen,  1988;  r  >  .50  indicates  a  strong  correlation),  they 
were  considered  separately  in  subsequent  analyses  because  there  is 
evidence  that  some  children  failed  to  coordinate  their  knowledge 
of  the  three  counting  principles  in  these  tasks  even  though  they 
demonstrated  competence  in  reciting  the  number  sequence  and 
applied  it  to  objects  and  events  (e.g.,  Bermejo  et  al.,  2004;  Free¬ 
man,  Antonucci,  &  Lewis,  2000;  Sarnecka  &  Gelman,  2004; 
Sophian,  1988). 

Additive  reasoning  was  measured  by  children’s  performance  on 
two  tasks  that  assessed  their  understanding  of  the  commutativity 
and  complement  principles.  The  scores  for  the  commutativity 
knowledge  were  moderately  correlated  with  the  scores  for  the 
complement  knowledge  (r  =  .37).  Although  they  are  related 
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constructs,  they  were  considered  separately  in  subsequent  analyses 
because  some  studies  suggested  that  some  children  might  master 
the  commutativity  principle  before  they  could  do  so  for  the  com¬ 
plement  principle  (e.g.,  Canobi  et  ah,  2003;  Torbeyns,  Peters,  De 
Smedt,  Ghesquiere,  &  Vershaffel,  2016). 

The  next  set  of  analyses  explores  the  correlations  between  the 
main  predictors  (i.e.,  working  memory,  counting  ability,  and  ad¬ 
ditive  reasoning)  and  each  measure  of  mathematical  achievement 
at  the  two  waves  of  assessment  (T1  and  T2).  Table  4  shows  the 
bivariate  correlations  between  the  variables.  Several  key  findings 
are  identified.  First,  the  scores  of  central  executive,  digit  span 
forward,  and  Corsi  span  significantly  correlated  with  each  other. 
This  result  is  consistent  with  the  theoretical  model  of  working 
memory  (Baddeley  &  Hitch,  1974;  Baddeley,  1992)  that  these 
three  components  are  related  to  each  other.  However,  the  correla¬ 
tions  of  the  scores  on  these  measures  with  mathematical  achieve¬ 
ment  varied.  Children’s  performance  in  the  central  executive  cor¬ 
related  moderately  (all  coefficients  >  .30)  with  the  scores  in 
calculation  and  story  problem  solving  at  both  T1  and  T2.  By 
contrast,  the  scores  of  visuospatial  sketchpad  had  no  significant 
correlation  with  any  measure  of  mathematical  achievement  at  both 
time  points.  The  scores  of  phonological  loop  had  significant  cor¬ 
relations  with  calculation  at  T1  and  T2,  but  did  not  correlate  with 
story  problem  solving  at  both  time  points.  Second,  both  indicators 
of  the  construct  “counting  ability,”  procedural  and  conceptual 
knowledge  of  counting,  had  significant  correlations  with  children’s 
performance  in  calculation  concurrently  and  longitudinally.  How¬ 
ever,  only  conceptual  knowledge  of  counting  correlated  with  chil¬ 
dren’s  performance  in  story  problem  solving.  Finally,  all  measures 
of  additive  reasoning  had  strong  (all  coefficients  close  to  and 
larger  than  0.50)  and  significant  correlations  with  both  calculation 
and  story  problem  solving  at  both  time  points. 


Main  Analyses:  Multiple  Regression  Analyses 

Having  obtained  significant  correlations  between  a  predictor 
and  mathematical  achievement  is  not  sufficient  to  conclude  that 
the  contributions  of  that  predictor  is  unique,  because  the  different 
predictors  may  share  variance  that  relates  to  the  measure  of  math¬ 
ematical  achievement.  Thus,  we  used  multiple  regression  analyses 
to  examine  the  independent  contributions  of  individual  predictors 
to  an  outcome  variable.  In  the  subsequent  sections,  sets  of  fixed- 
order  regression  analyses  are  reported  to  assess  the  contributions 
of  working  memory,  counting  ability,  and  additive  reasoning  to 
explaining  individual  differences  in  calculation  and  story  problem 
solving  at  T1  and  T2.  Prior  to  each  of  the  following  analyses, 
assumptions  of  regression  analyses  were  checked  and  showed  no 
breaches  to  normality,  linearity,  homoscedasticity,  multicollinear- 
ity,  and  autocorrection. 

Concurrent  Predictions 

Outcome  variable:  Calculation.  The  first  hypothesis  of  this 
study  states  that  counting  ability  is  important  in  children’s  math¬ 
ematics  learning  and  its  influence  is  independent  from  that  of 
general  cognitive  capacities,  such  as  working  memory.  To  test  this 
hypothesis,  variables  of  counting  ability  were  entered  in  the  last 
block  of  a  regression  model  after  age,  IQ,  and  working  memory 
(the  central  executive,  phonological  loop,  and  visuospatial  sketch¬ 
pad).  Table  5  shows  that  counting  ability  explained  an  additional 
5.4%  of  variance  in  T1  calculation  beyond  the  effects  of  age,  IQ, 
and  working  memory.  This  finding  supported  the  first  hypothesis. 
Conceptual  knowledge  of  counting  was  an  independent  predictor 
of  T1  calculation  in  the  final  block  ((3  =  0.191,  t  =  2.104,  p  < 
.05).  By  contrast,  procedural  counting  was  not  a  significant  pre¬ 
dictor  {p  >  0.05). 


Table  4 


Bivariate  Correlations  Among  Standardized  Variables  (N  =  115) 


Variables 

1 

2 

Counting 

ability 

Working  memory 

Additive 

reasoning 

Mathematical  achievement 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12  13 

1.  Age  in  months 

1 

2.  IQ  (Raven’s  scores) 

.33** 

1 

Counting  ability 

3.  Procedural  counting 

.08 

.01 

1 

4.  Counting  knowledge 

.05 

.05 

.62** 

1 

Working  memory 

5.  Central  executive 

.15 

.16 

.15 

.13 

1 

6.  Digit  span  forward 

.12 

.14 

.18 

.20* 

.25** 

1 

7.  Corsi  span 

.02 

.15 

.12 

.13 

.24** 

.29** 

1 

Additive  reasoning 

t 

8.  Commutativity  knowledge 

.11 

.15 

.16 

.17 

.10 

.14 

.04 

1 

9.  Complement  knowledge 

.09 

.15 

.15 

.16 

.14 

.02 

.03 

.37** 

1 

Mathematical  achievement 

10.  T1  Calculation 

.07 

.13 

.22* 

.31** 

.35** 

.19* 

.06 

.50** 

.54** 

1 

1 1 .  T2  Calculation 

.08 

.17 

.24** 

.29** 

.42** 

.25** 

.14 

.51*’ 

.54** 

.81** 

1 

12.  T1  Story  problem  solving 

.15 

.18 

.16 

.20* 

.33** 

.04 

.04 

.56** 

.56** 

.55** 

.64** 

1 

13.  T2  Story  problem  solving 

.09 

.20* 

.13 

.21* 

.35** 

.10 

.05 

.52** 

.62** 

.57** 

.68*’ 

.75**  1 

Note.  T1  =  Time  1;  T2  =  Time  2. 

*  Correlation  is  significant  at  the  .05  level  (2-tailed).  **  Correlation  is  significant  at  the  .01  level  (2-tailed). 
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Table  5 

The  Additional  Amount  of  Variance  of  Time  1  Calculation  Explained  by  Counting  Ability  Beyond 
Age,  IQ,  and  Working  Memory  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R 2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.004 

.004 

.475 

.492 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.016 

.012 

1.368 

.245 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.188 

.172 

7.694*** 

<.001 

(3,  109) 

4 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Procedural  counting 
Counting  knowledge 

.243 

.054 

3.843* 

.024 

(2,  107) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 


The  second  hypothesis  of  this  study  is  that  additive  reasoning 
(as  assessed  by  knowledge  of  commutativity  and  the  complement 
principle)  is  independent  from  and  more  important  than  counting 
ability  and  general  cognitive  capacities,  such  as  working  memory 
in  children’s  mathematics  learning.  To  address  this  hypothesis, 
variables  of  additive  reasoning  were  entered  in  the  final  step  of  a 
regression  model  after  all  the  other  factors,  including  age,  IQ, 
working  memory,  and  counting  ability  were  controlled  for.  Table 
6  shows  that  additive  reasoning  accounted  for  an  additional  28.8% 
of  variance  in  T1  calculation  beyond  the  influence  of  all  the  other 
factors.  Both  commutativity  knowledge  (3  =  0.313,  t  —  39.78, 
p  <  .001)  and  complement  knowledge  (3  =  0.34,  t  =  4.286,  p  < 
.001)  remained  significant  and  independent  predictors  of  T1  cal¬ 


culation  in  the  final  model.  Thus,  the  second  hypothesis  of  the 
present  study  was  strongly  supported. 

The  third  hypothesis  of  this  study  states  that  working  memory 
makes  a  contribution  to  mathematical  achievement,  even  when  one 
accounted  for  children’s  specific  mathematical  knowledge  such  as 
their  knowledge  of  counting  and  additive  reasoning.  To  examine 
this  hypothesis,  variables  of  working  memory  were  entered  as  the 
final  step  of  a  regression  model  after  all  the  other  factors,  such  as 
age,  IQ,  counting  ability,  and  additive  reasoning.  Table  7  shows 
that  working  memory  accounted  for  an  additional  8%  of  variance 
in  T1  calculation  after  the  effects  of  all  the  other  factors  were 
controlled  for.  This  finding  supports  the  third  hypothesis  of  the 
present  study.  Among  the  variables  in  working  memory,  the  cen- 


Table  6 


The  Additional  Amount  of  Variance  of  Time  1  Calculation  Explained  by  Additive  Reasoning 
Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.004 

.004 

.475 

.492 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.016 

.012 

1.368 

.245 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 

.243 

.226 

6.394*** 

<.001 

(5,  107) 

4 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.530 

.288 

32.152**’ 

<.001 

(2,  105) 

***  Significant  at  the  .001  level. 
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Table  7 

The  Additional  Amount  of  Variance  of  Time  1  Calculation  Explained  by  Working  Memory 
Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R 2 

R 2  change 

F change 

Significant 

F  change 

m 

1 

Age  in  months 

.004 

.004 

.475 

.492 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.016 

.012 

1.368 

.245 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.450 

.434 

21.308*** 

<.001 

\ 

(4,  108) 

4 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 
Phonological  loop 

Visuospatial  sketchpad 

.530 

.080 

5.966*** 

=  .001 

(3,  105) 

***  Significant  at  the  .001  level. 


tral  executive  was  the  only  significant  predictor  of  children’s 
performance  in  T1  calculation  (P  =  0.298,  t  =  4.044,  p  <  .001). 
It  shows  that  the  central  executive  makes  a  unique  contribution  to 
T1  calculation  even  when  the  influence  of  counting  ability  and 
additive  reasoning  was  controlled  for.  By  contrast,  the  contribu¬ 
tions  of  phonological  loop  and  visuospatial  sketchpad  on  chil¬ 
dren’s  performance  on  T1  calculation  were  not  statistically  signif¬ 
icant. 

Outcome  variable:  Story  problem  solving.  Is  the  concurrent 
contribution  of  counting  ability  to  T1  story  problem  solving  sim¬ 
ilar  to  that  to  T1  calculation?.  Table  8  shows  that  counting  ability 
did  not  explain  a  significant  amount  of  variance  in  T1  story 
problem  solving  (2%  only).  Both  variables  of  counting  ability  did 


not  make  significant  contributions  to  T1  story  problem  solving  (all 
p  values  >  0.05).  Therefore,  the  first  hypothesis  was  not  supported 
in  the  analyses  for  T1  story  problem  solving. 

The  regression  analyses  for  T1  calculation  supported  the  second 
hypothesis  by  demonstrating  that  additive  reasoning  was  the  stron¬ 
gest  predictor  even  after  the  influence  of  age,  IQ,  counting  ability, 
and  working  memory  was  controlled  for.  Can  this  finding  be 
replicated  in  T1  story  problem  solving?  Table  9  shows  that  addi¬ 
tive  reasoning  explained  a  substantial  and  significant  amount  of 
variance  in  T1  story  problem  solving  beyond  the  effects  of  all  the 
other  factors  (38.8%).  Both  variables  of  additive  reasoning  made 
unique  contributions  to  accounting  for  the  variance:  commutativity 
knowledge  ((3  =  0.326,  t  =  4.203,  p  <  .001)  and  complement 


Table  8 


The  Additional  Amount  of  Variance  of  Time  1  Story  Problem  Solving  Explained  by  Counting 
Ability  Beyond  Age,  IQ,  and  Working  Memory  (N  —  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F change 

Significant 

F change 

(40 

1 

Age  in  months 

.024 

.024 

2.738 

.101 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.040 

.016 

1.907 

.170 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.142 

.102 

4.328** 

.006 

(3,  109) 

t 

4 

Age  in  months 

Non-verbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Procedural  counting 
Counting  knowledge 

.162 

.020 

1.292 

.279 

(2,  107) 

**  Significant  at  the  .01  level. 


ADDITIVE  REASONING  AND  MATHEMATICAL  ACHIEVEMENT 


493 


Table  9 

The  Additional  Amount  of  Variance  of  Time  1  Story  Problem  Solving  Explained  by  Additive 
Reasoning  Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R 2  change 

F change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.024 

.024 

2.738 

.101 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.040 

.016 

1.907 

.170 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 

.162 

.122 

3.128** 

.011 

(5,  107) 

4 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.550 

.388 

45.246*** 

<.001 

(2,  105) 

**  Significant  at  the  .01  level.  ***  Significant  at  the  .001  level. 


knowledge  (|3  =  0.429,  t  =  5.568,  p  <  .001).  Therefore,  this 
finding  concurs  with  that  for  T1  calculation  and  the  second  hy¬ 
pothesis  was  strongly  supported. 

Table  10  shows  that  when  variables  of  working  memory  were 
entered  in  the  last  block  after  all  the  other  factors  were  controlled 
for,  they  continued  to  explain  a  significant  amount  of  variance 
(4.7%)  in  T1  story  problem  solving.  Therefore,  the  third  hypoth¬ 
esis  of  the  present  study  was  supported  by  the  findings  for  both 
calculation  and  story  problem  solving  at  Tl.  Similar  to  T1  calcu¬ 
lation,  the  only  significant  variable  in  working  memory  uniquely 


accounting  for  variance  in  Tl  story  problem  solving  was  the 
central  executive  ((3  =  0.227,  t  =  3.148,  p  =  .002). 

Longitudinal  Predictions 

Outcome  variable:  Calculation.  The  first  set  of  regression 
analyses  regarding  the  longitudinal  predictions  of  children’s  per¬ 
formance  in  calculation  concerns  the  unique  contributions  of 
counting  ability.  Similar  to  Tl  calculation,  when  entered  after  age, 
IQ,  and  working  memory  (see  Table  11),  counting  ability  ac- 


Table  10 


The  Additional  Amount  of  Variance  of  Time  1  Story  Problem  Solving  Explained  by  Working 
Memory  Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.024 

.024 

2.738 

.101 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.040 

.016 

1.907 

.170 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.503 

.463 

25.155*** 

<.001 

(4,  108) 

4 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 

Phonological  loop 

Visuospatial  sketchpad 

.550 

.047 

3.666* 

.015 

(3,  105) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 
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Table  11 

The  Additional  Amount  of  Variance  of  Time  2  Calculation  Explained  by  Counting  Ability  Beyond 
Age,  IQ,  and  Working  Memory  (N  =  115)  


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(dj) 

1 

Age  in  months 

.006 

.006 

.722 

.397 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.034 

.028 

3.197 

.076 

(1.  H2) 

3 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.247 

.212 

10.269*** 

<.001 

V 

(3,  109) 

4 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Procedural  counting 
Counting  knowledge 

.288 

.041 

3.843* 

.048 

(2,  107) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 


counted  for  a  significant  amount  of  variance  in  T2  calculation 
(4.1%).  However,  presumably  because  of  the  shared  variance  of 
T2  calculation  explained  by  conceptual  knowledge  of  counting  and 
procedural  counting,  both  variables  were  not  unique  predictors  of 
T2  calculation  (p  values  >  0.05).  Because  counting  ability  as  a 
whole  explained  a  significant  amount  of  variance  in  T2  calculation 
beyond  the  effects  of  age,  IQ,  and  working  memory,  this  finding 
was  considered  as  supporting  evidence  for  the  first  hypothesis  of 
the  present  study. 

The  second  set  of  regression  analyses  regarding  the  longitudinal 
predictions  of  children’s  performance  in  calculation  concerns  the 
unique  contributions  of  additive  reasoning.  Consistent  with  the  hy¬ 
pothesis,  Table  12  shows  that  when  variables  of  additive  reasoning 


were  entered  in  the  last  step,  they  continued  to  account  for  a  substan¬ 
tial  and  significant  amount  of  variance  in  T2  calculation  (30%).  The 
independent  contributions  of  commutativity  ((3  =  0.333,  t  =  4.491, 
p  <  .001)  and  complement  knowledge  ((3  =  0.334,  t  —  4.533,  p  < 
.001)  remained  significant  after  all  the  other  factors  were  controlled 
for.  Therefore,  consistent  with  the  findings  for  T1  calculation  and 
story  problem  solving,  this  evidence  strongly  supported  the  second 
hypothesis. 

Similar  to  the  analyses  on  T1  calculation  and  story  problem 
solving.  Table  13  demonstrates  that  working  memory  explained  a 
significant  amount  of  variance  in  T2  calculation  (11%)  when  the 
effects  of  all  other  factors  were  taken  into  account.  This  finding 
was  consistent  with  the  third  hypothesis  that  working  memory 


Table  12 


The  Additional  Amount  of  Variance  of  Time  2  Calculation  Explained  by  Additive  Reasoning 
Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.006 

.006 

.722 

.397 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.034 

.028 

3.197 

.076 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 

.288 

.242 

7.655*** 

<.001 

(5,  107) 

4 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.589 

.300 

38.32*** 

<.001 

A  105) 

Significant  at  the  .001  level. 
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Table  13 

The  Additional  Amount  of  Variance  of  Time  2  Calculation  Explained  by  Working  Memory 
Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R 2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.006 

.006 

.722 

.397 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.034 

.028 

3.197 

.076 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.478 

.444 

22996— 

<.001 

(4,  108) 

4 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 

Phonological  loop 

Visuospatial  sketchpad 

.589 

.110 

9  394*** 

<.001 

(3,  105) 

Significant  at  the  .001  level. 


makes  a  unique  contribution  to  children’s  mathematics  learning 
beyond  the  specific  mathematical  knowledge,  such  as  counting 
ability  and  additive  reasoning.  Among  the  variables  in  working 
memory,  only  the  central  executive  was  a  unique  predictor  of  T2 
calculation  ((3  =  0.321,  t  =  4.653,  p  <  .001)  when  the  effects  of 
all  other  variables  were  controlled  for. 

Outcome  variable:  Story  problem  solving.  Because  count¬ 
ing  ability  did  not  correlate  significantly  with  T2  story  problem 
solving,  it  is  unlikely  that  it  would  make  a  unique  contribution  to 
it  when  the  effect  of  working  memory  is  also  controlled  for. 
Consistent  with  this  prediction,  Table  14  shows  that  counting 
ability  only  accounted  for  2%  of  variance  in  T2  story  problem 
solving  beyond  the  influence  of  age,  IQ,  and  working  memory. 


Both  conceptual  knowledge  of  counting  and  procedural  counting 
were  not  independent  predictors  (p  values  >  0.05).  This  finding  is 
consistent  with  that  of  T1  story  problem  solving  in  which  counting 
ability  was  also  not  a  good  predictor.  Therefore,  the  first  hypoth¬ 
esis  was  not  supported  by  the  result  for  children’s  performance  in 
story  problem  solving  at  both  T1  and  T2. 

When  additive  reasoning  was  entered  in  the  last  step  after  all 
the  other  factors  were  controlled  for,  Table  15  shows  that  it 
continued  to  explain  a  large  amount  of  variance  in  T2  story 
problem  solving  (38.6%).  This  finding  is  consistent  with  that  of 
T1  and  T2  calculation  as  well  as  T1  story  problem  solving  that 
strongly  support  the  second  hypothesis  regarding  the  impor¬ 
tance  of  additive  reasoning  in  children’s  mathematics  learning. 


Table  14 


The  Additional  Amount  of  Variance  of  Time  2  Story  Problem  Solving  Explained  by  Counting 
Ability  Beyond  Age,  IQ,  and  Working  Memory  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.008 

.008 

.865 

.354 

(1,  H3) 

2 

Age  in  months 

Nonverbal  intelligence 

.042 

.034 

4.032* 

.047 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.175 

.133 

5.836*** 

.001 

(3,  109) 

4 

Age  in  months 

Nonverbal  intelligence 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Procedural  counting 
Counting  knowledge 

.195 

.020 

1.322 

.271 

(2,  107) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 
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Table  15 

The  Additional  Amount  of  Variance  of  Time  2  Story  Problem  Solving  Explained  by  Additive 
Reasoning  Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.008 

.008 

.865 

.354 

(1,  H3) 

2 

Age  in  months 

Nonverbal  intelligence 

.042 

.034 

4.032* 

.047 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Cential  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 

.195 

.152 

4.051** 

.002 

\ 

(5,  107) 

4 

Age  in  months 

Nonverbal  intelligence 

Central  executive 

Phonological  loop 

Visuospatial  sketchpad 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.581 

.386 

48.403*** 

<.001 

(2,  105) 

*  Significant  at  the  .05.  **  Significant  at  the  .01  level.  ***  Significant  at  the  .001  level. 


Both  commutativity  knowledge  ((5  =  0.351,  t  =  4.758,  p  < 
.001)  and  complement  knowledge  ((3  =  0.406,  t  =  5.455,  p  < 
.001)  made  significant  contributions,  independently  of  all  the 
other  factors,  to  T2  story  problem  solving. 

When  the  influence  of  all  the  other  factors  were  controlled  for 
(see  Table  16),  working  memory  continued  to  explain  a  significant 
amount  of  variance  in  T2  story  problem  solving  (6.6%).  The 
central  executive  was  the  only  variable  in  working  memory  that 
made  a  unique  contribution  to  T2  story  problem  solving  beyond 
the  effects  of  age,  IQ,  counting  ability,  and  additive  reasoning  ((3  = 


0.278,  t  —  3.989,  p  <  .001).  These  results  were  consistent  with  that 
in  T1  and  T2  calculation  as  well  as  T2  story  problem  solving.  All 
these  findings  consistently  support  the  third  hypothesis  that  working 
memory  is  a  factor  that  contributes  to  mathematics  learning  indepen¬ 
dently  of  children’s  ability  to  count  and  reason  additively. 

Overall  Summary  of  the  Regression  Analyses 

Several  key  findings  from  the  regression  analyses  with  respect 
to  the  hypotheses  are  summarized  as  follows.  First,  it  was  hypoth- 


Table  16 

The  Additional  Amount  of  Variance  of  Time  2  Story  Problem  Solving  Explained  by  Working 
Memory  Beyond  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.008 

.008 

.865 

.354 

(1.  H3) 

2 

Age  in  months 

Nonverbal  intelligence 

.042 

.034 

4.032* 

.047 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.515 

.473 

26.372**’ 

<.001 

(4,  108) 

4 

Age  in  months 

Nonverbal  intelligence 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.581 

.066 

5.472** 

.002 

t  (3,  105) 

*  Significant  at  the  .05  level.  **  Significant  at  the  .01  level.  ***  Significant  at  the  .001  level. 
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esized  that  counting  ability  is  important  in  children’s  mathematics 
learning  and  its  influence  is  independent  from  that  of  general 
cognitive  capacities,  such  as  working  memory.  This  hypothesis  is 
only  partially  supported.  Counting  ability  contributes  significantly 
to  explaining  the  variance  in  calculation  beyond  the  influence  of 
age,  IQ,  and  working  memory.  However,  it  does  not  make  a  unique 
contribution  to  story  problem  solving.  Procedural  counting  appears 
not  to  be  a  good  predictor  of  children’s  performance  in  both 
calculation  and  story  problem  solving  in  the  present  study.  How¬ 
ever,  conceptual  knowledge  of  counting  is  a  unique  predictor  of 
calculation,  independent  of  age,  IQ,  and  working  memory,  at  both 
time  points. 

Second,  it  was  hypothesized  that  additive  reasoning  (as  assessed 
by  knowledge  of  commutativity  and  the  complement  principle)  is 
independent  from  and  more  important  than  counting  ability  and 
general  cognitive  capacities,  such  as  working  memory  in  chil¬ 
dren’s  mathematics  learning.  The  findings  show  that  commutativ¬ 
ity  and  complement  knowledge  are  independently  and  strongly 
related  to  of  children’s  performance  in  calculation  and  story  prob¬ 
lem  solving  concurrently  and  longitudinally.  The  amount  of  vari¬ 
ance  explained  by  additive  reasoning  is  the  largest  among  all  the 
other  factors,  such  as  counting  ability  and  working  memory. 
Therefore,  the  second  hypothesis  is  strongly  supported. 

Third,  it  was  hypothesized  that  working  memory  is  important  in 
its  own  right  in  explaining  variations  in  mathematical  achieve¬ 
ment.  In  support  of  this  hypothesis,  working  memory  explains  a 
significant  amount  of  variance  in  calculation  and  story  problem 
solving  concurrently  and  longitudinally  beyond  the  effects  of  all 
the  other  factors.  The  central  executive  component  of  working 
memory  is  a  unique  predictor  of  children’s  performance  in  calcu¬ 
lation  and  story  problem  solving  concurrently  and  longitudinally, 
even  after  the  effects  of  other  factors,  such  as  counting  ability  and 
additive  reasoning  are  controlled  for.  However,  the  phonological 
loop  and  visuospatial  sketchpad  appear  not  to  make  significant 
contributions  to  children’s  performance  in  both  calculation  and 
story  problem  solving. 

Taken  together,  among  the  three  main  predictors  of  interest  in 
the  present  study,  working  memory  and  additive  reasoning  seem  to 
be  more  important  than  counting  ability  for  children’s  mathematics 
learning.  The  present  study  suggests  that  the  central  executive 
component  of  working  memory  as  well  as  the  knowledge  of  the 
commutativity  and  complement  principles  are  particularly  crucial. 

Autoregressive  Models  of  Calculation  and  Story 
Problem  Solving 

Autoregressive  analysis  for  T2  calculation.  Previous  multi¬ 
ple  regression  models  show  that  working  memory  and  additive 
reasoning  are  significant  longitudinal  predictors  of  children’s  per¬ 
formance  in  calculation  and  story  problem  solving.  One  question 
remains:  Are  these  variables  strong  enough  to  be  unique  predictors 
of  mathematical  achievement  (T2)  even  when  children’s  previous 
performance  in  mathematical  achievement  (Tl)  is  taken  into  ac¬ 
count?  This  question  is  important  because  it  is  possible  that  what 
these  predictors  had  in  common  with  mathematical  achievement  at 
Tl  actually  explains  their  longitudinal  power.  If  they  remain 
significant  longitudinal  predictors  of  variance  after  children  s  per¬ 
formance  in  mathematical  achievement  at  Tl  is  controlled  for,  the 


case  for  their  predictive  value  is  very  strong.  Thus,  two  sets  of 
autoregressive  analyses  were  conducted  to  examine  this  question. 

Table  17  shows  that  when  the  variables  of  additive  reasoning 
were  entered  in  the  final  step,  the  amount  of  variance  in  T2 
calculation  that  they  accounted  for  additionally  was  significant 
(3.9%).  Both  commutativity  and  complement  knowledge  remained 
unique  predictors  of  T2  calculation  even  when  children’s  previous 
performance  in  calculation  at  Tl  and  all  the  other  factors  were 
taken  into  account;  commutativity  knowledge  ((3  =  0.144,  t  = 
2.217,  p  <  .05)  and  complement  knowledge  ((3  =  0.159,  t  =  2.50, 
P  =  -01). 

When  the  variables  of  working  memory  were  entered  in  the  final 
step  after  the  effects  of  Tl  calculation  and  all  the  other  factors  are 
controlled  for  (see  Table  18),  working  memory  explained  a  sig¬ 
nificant  amount  of  variance  in  T2  calculation  (2.9%).  The  central 
executive  was  a  unique  predictor  of  T2  calculation  even  when 
children’s  previous  performance  in  calculation  at  Tl  and  all  the 
other  factors  were  considered  ((3  =  0.155,  t  =  2.588,  p  =  .011). 

Autoregressive  analysis  for  T2  story  problem  solving. 

Table  19  shows  that  when  the  variables  of  additive  reasoning  were 
entered  in  the  final  step,  they  accounted  for  a  significant  amount  of 
variance  in  T2  story  problem  solving  (6.9%)  when  Tl  story  prob¬ 
lem  solving  and  all  the  other  factors  were  controlled  for.  Both 
commutativity  and  complement  knowledge  remained  unique  pre¬ 
dictors  of  T2  story  problem  solving  even  when  children’s  previous 
performance  in  story  problem  solving  at  Tl  and  all  the  other 
factors  were  taken  into  account;  commutativity  knowledge  ((3  = 
0.164,  t  =  2.168,  p  <  .05)  and  complement  knowledge  ((3  = 
0.261,  t  =  3.607,  p  <  .001). 

Table  20  shows  that  when  the  variables  of  working  memory 
were  entered  in  the  final  step  after  the  effects  of  Tl  story  problem 
solving  and  all  the  other  factors  are  controlled  for.  Working 
memory  explained  a  significant  amount  of  variance  in  T2  story 
problem  solving  (2.8%).  The  central  executive  was  a  unique  pre¬ 
dictor  of  T2  story  problem  solving  even  when  children’s  previous 
performance  in  story  problem  solving  at  Tl  and  all  the  other 
factors  were  taken  into  account  ((3  =  0.178,  t  —  2.734,  p  <  .01). 

In  summary,  the  central  executive  component  of  working  mem¬ 
ory  as  well  as  the  knowledge  of  the  commutativity  and  comple¬ 
ment  principles  are  strong  predictors  for  children’s  mathematical 
achievement.  These  variables  continued  to  account  for  significant 
amounts  of  variance  in  calculation  and  story  problem  solving 
longitudinally  even  when  children’s  previous  performance  in  the 
mathematical  achievement  tasks  was  taken  into  account.  Thus,  the 
autoregressive  analyses  confirm  the  power  and  importance  of 
additive  reasoning  and  working  memory  in  children’s  mathematics 
learning. 

Specificity  of  predictions  made  by  additive  reasoning  tasks. 

Previous  analyses  have  established  a  strong  relation  between  chil¬ 
dren’s  ability  to  reason  mathematically  and  their  achievement  in 
mathematics,  but  it  is  possible  that  their  performance  in  the  addi¬ 
tive  reasoning  tasks  may  predict  their  attainment  in  other  academic 
subjects  as  well.  Examining  this  possibility  is  important  because  it 
will  let  us  know  more  about  the  reason  why  additive  reasoning 
predicts  mathematical  achievement  so  strongly.  This  may  be  be¬ 
cause  the  associations  that  children  have  to  reason  about  in  these 
tasks  are  specific  to  mathematics  learning,  in  which  case  it  would 
not  be  likely  that  children’s  performance  in  additive  reasoning 
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Table  17 

The  Additional  Amount  of  Variance  of  Time  2  Calculation  Explained  by  Additive  Reasoning 
Beyond  Time  1  (Tl)  Calculation  and  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.006 

.006 

.722 

.397 

(1,  H3) 

2 

Age  in  months 

Nonverbal  intelligence 

.034 

.028 

3.197 

.076 

(1,  H2) 

3 

Age  in  months 

Non-verbal  intelligence 

Tl  calculation 

.668 

.635 

212.426*** 

<.001 

\ 

(1,  111) 

4 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 

.674 

.005 

.846 

.432 

(2,  109) 

5 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 

Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.696 

.022 

2.781* 

.047 

(3,  106) 

6 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 

Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Commutativity  knowledge 
Complement  knowledge 

.734 

.039 

7.567*** 

.001 

(2,  104) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 


would  predict  children’s  performance  in  a  subject  that  does  not 
involve  mathematical  reasoning,  such  as  word  reading.  If  additive 
reasoning  tasks  predict  mathematics  learning  just  because  they 
measure  reasoning  in  general,  they  should  also  predict  children’s 
performance  in  other  nonmathematical  academic  subjects. 

Chinese  word  reading  was  also  used  as  one  of  the  T2  outcome 
measures  for  testing  the  specificity  of  predictors  for  mathemat¬ 
ical  performance.  All  of  the  stimuli  in  the  reading  tasks  involve 
two  characters.  Sometimes  Chinese  children  can  reason  from 
one  character  as  a  cue  to  guess  the  pronunciation  of  another 
character  and  the  meaning  of  the  word.  Thus,  this  task  may  also 
demand  some  aspects  of  general  reasoning.  Bivariate  correla¬ 
tion  shows  that  the  central  executive  significantly  correlated 
with  children’s  performance  in  reading  (r  =  .271,  p  <  .01).  By 
contrast,  there  were  no  significant  correlations  between  both 
measures  of  additive  reasoning  and  children’s  scores  in  reading 
(r  =  0.124,  p  >  .05).  Therefore,  the  fact  that  the  additive 
reasoning  tasks  predicted  children’s  mathematical  achievement 
much  better  than  in  Chinese  word  reading  confirms  the  speci¬ 
ficity  and  importance  of  additive  reasoning  in  supporting  chil¬ 
dren  to  learn  mathematics. 

Discussion 

The  purpose  of  this  study  was  to  evaluate  the  relative  impor¬ 
tance  of  working  memory,  counting  ability,  and  additive  reasoning 


in  children’s  mathematics  learning.  The  key  findings  of  this  study 
have  contributed  to  the  literature  in  several  ways.  First,  Nunes  and 
colleagues  (2007,  2012)  found  that  quantitative  reasoning  was  a 
significant  and  specific  predictor  of  children’s  mathematical 
achievement  beyond  IQ  and  working  memory.  The  present  study 
replicated  the  finding  regarding  the  close  connection  between 
quantitative  reasoning  and  mathematical  achievement  in  a  non- 
Caucasian  cultural  context.  Second,  whereas  previous  studies  dem¬ 
onstrated  a  strong  link  between  a  global  measure  of  quantitative 
reasoning  and  test  scores  on  general  mathematical  achievement, 
the  present  study  showed  that  mathematical  reasoning  in  the  do¬ 
main  of  addition  and  subtraction  in  particular  related  significantly 
to  both  calculation  and  story  problem  solving  concurrently  and 
longitudinally.  Third,  the  autoregressive  analyses  indicated  that 
variables  in  additive  reasoning  and  the  central  executive  compo¬ 
nent  of  working  memory  remained  independent  predictors  of  T2 
mathematical  achievement  beyond  the  influence  of  children’s  per¬ 
formance  on  Tl  mathematical  achievement.  This  is  strong  evi¬ 
dence  for  the  predictive  powers  of  these  variables.  Fourth,  this 
study  incorporated  procedural  and  conceptual  counting  as  the 
indicators  of  counting  ability  and  showed  that  only  conceptual 
counting  was  uniquely  predictive  of  calculation,  but  not  of  story 
problem  solving. 

In  the  following  sections,  we  discuss  the  key  findings  with 
respect  to  the  hypotheses  of  this  study  and  the  extent  to  which  the 
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Table  18 

The  Additional  Amount  of  Variance  of  Time  2  Calculation  Explained  by  Working  Memory 
Beyond  Time  1  (Tl)  Calculation  and  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.006 

.006 

.722 

.397 

(L  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.034 

.028 

3.197 

.076 

(1,  112) 

3 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

.668 

.635 

212.426*** 

<.001 

(1,  111) 

4 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 

.674 

.005 

.846 

.432 

(2,  109) 

5 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.705 

.032 

5.767** 

.004 

(2,  107) 

6 

Age  in  months 

Nonverbal  intelligence 

Tl  calculation 

Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.734 

.029 

3.8** 

.012 

(3,  104) 

Significant  at  the  .01  level.  ***  Significant  at  the  .001  level. 


findings  are  consistent  with  previously  published  knowledge  on 
the  topic.  Then,  the  theoretical  and  educational  implications  of  the 
results  on  children’s  mathematics  learning  and  education  are  dis¬ 
cussed.  Finally,  limitations  of  this  study  are  identified  and  sugges¬ 
tions  for  future  research  are  made  toward  the  end  of  this  article. 

Contributions  of  Counting  Ability 

On  the  basis  of  the  mathematical  thinking  perspective,  counting 
ability  was  hypothesized  to  be  one  of  the  important  cognitive 
foundations  for  children’s  mathematics  learning.  The  mathemati¬ 
cal  thinking  perspective  emphasizes  that  children  need  to  under¬ 
stand  the  meanings  of  number  to  perform  well  in  mathematics.  The 
knowledge  of  the  meanings  of  number  refers  to  the  understanding 
of  the  relations  between  numbers  and  quantities.  Learning  to  count 
is  relevant  in  this  respect  because  it  provides  children  with  words 
to  represent  quantities.  It  also  helps  children  reflect  upon  and 
develop  the  concept  of  one-to-one  correspondence,  ordinality,  and 
cardinality  as  well  as  the  coordinated  use  of  these  counting  prin¬ 
ciples.  According  to  the  mathematical  thinking  perspective,  grasp¬ 
ing  the  conceptual  knowledge  of  counting  is  more  important  that 
reciting  a  counting  sequence  in  children’s  mathematics  learning. 

The  regression  analyses  of  this  study  show  that  counting  ability 
accounted  for  a  significant  amount  of  variance  in  calculation  (both 
concurrently  and  longitudinally)  after  the  effects  of  age,  IQ,  and 
working  memory  were  controlled  for.  Thus,  the  first  hypothesis 
was  supported  for  calculation  in  this  study.  Among  the  counting 


measures,  conceptual  knowledge  of  counting  was  a  unique  predic¬ 
tor  of  children’s  performance  in  calculation  beyond  the  influence 
of  age,  IQ,  and  working  memory.  When  children  learn  to  calculate, 
they  usually  start  from  counting  all  of  the  numbers  presented  (i.e., 
the  count-all  procedure)  and  later  shift  to  counting  on  from  the 
cardinal  value  of  the  first  or  larger  number  presented.  (Fuson, 
1982).  The  more  efficient  counting-on  procedures  may  rely  on 
conceptual  knowledge  of  counting,  such  as  the  understanding  of 
cardinality.  Thus,  conceptual  knowledge  of  counting  contributes  to 
children’s  success  in  calculation. 

However,  another  measure  of  counting  ability,  procedural  count¬ 
ing,  did  not  make  independent  contributions  to  any  measure  of  math¬ 
ematical  achievement.  This  finding  is  at  odds  with  those  from  previ¬ 
ous  longitudinal  studies  (e.g.,  Aunola  et  al.,  2004;  Koponen,  Aunola, 
Ahonen,  &  Nurmi,  2007;  Koponen,  Salmi,  Eflund,  &  Aro,  2013; 
Passolunghi  et  al.,  2007;  Zhang,  Koponen,  Rasanen,  Aunola,  Lerk- 
kanen,  &  Nurmi,  2014).  For  example,  one  study  (Zhang  et  al.,  2014) 
showed  that  the  impact  of  procedural  counting  was  so  strong  that  it 
fully  mediated  the  longitudinal  association  between  spatial  visualiza¬ 
tion  and  letter  knowledge  with  a  group  of  Finnish  children’s  perfor¬ 
mance  in  arithmetic.  The  discrepancy  of  the  findings  could  be  attrib¬ 
utable  to  the  ceiling  performance  of  children  in  the  present  study, 
which  may  relate  to  the  languages  that  the  participating  children  speak 
in  different  research. 

In  counting,  there  are  units  of  different  sizes  that  can  be  counted 
within  different  classes.  For  example,  we  have  the  class  of  ones, 
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Table  19 

The  Additional  Amount  of  Variance  of  Time  2  Story  Problem  Solving  Explained  by  Additive 
Reasoning  Beyond  Time  1  (Tl)  Story  Problem  Solving  and  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R 2 

R2  change 

F  change 

Significant 

F  change 

(40 

1 

Age  in  months 

.008 

.008 

.865 

.354 

(1,  H3) 

2 

Age  in  months 

Nonverbal  intelligence 

.042 

.034 

4.032* 

.047 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 

.574 

.532 

138.703*** 

<.001 

\ 

(1,  111) 

4 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 

.580 

.006 

.787 

.458 

(2,  109) 

5 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 

Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.601 

.021 

1.739 

.109 

(3,  106) 

6 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 

Central  executive 
Phonological  loop 
Visuospatial  sketchpad 
Commutativity  knowledge 
Complement  knowledge 

.670 

.069 

10.852*** 

<.001 

(2,  104) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 


the  class  of  tens,  the  class  of  hundreds,  and  so  on.  Because  most  of 
us  use  a  base- 10  system,  when  we  have  10  units  of  any  size,  we 
regroup  these  into  units  of  the  next  size.  For  instance,  10  “ones” 
make  up  one  “ten,”  and  10  “tens”  make  up  one  “hundred.”  In 
Chinese  number  words,  the  base  structure  of  the  number  system  is 
transparent.  Counting  with  Chinese  number  words  makes  children 
recognize  easily  that  they  are  counting  different  units  and  they  can 
repeat  the  same  reasoning  indefinitely  to  generate  number  words 
that  they  have  not  been  taught  formally  before.  Thus,  the  system¬ 
atic  relation  between  the  number  words  in  Chinese  language  and 
the  underlying  base- 10  values  may  contribute  to  Chinese-speaking 
children’s  early  mastery  of  procedural  counting  (Miller  &  Stigler, 
1987;  Miller  et  al.,  1995;  Miura  et  al.,  1988).  This  may  be  one  of 
the  reasons  that  the  children  in  the  present  study  had  exceptional 
performance  in  procedural  counting.  Thus,  the  variation  of  chil¬ 
dren’s  performance  on  this  task  was  small  in  this  study,  which 
might  have  influenced  the  strength  of  its  relation  to  mathematical 
achievement. 

By  contrast,  most  of  the  children  who  participated  in  previous 
longitudinal  research  spoke  European  languages  (e.g.,  English  and 
Finnish).  In  these  languages,  the  base  structure  of  the  number 
system  is  only  partially  reflected  in  the  language,  for  example, 
“tens”  are  counted  with  different  names  like  10,  20,  30,  and  so 
forth.  This  may  render  it  more  difficult  for  some  young  children  to 
grasp  the  underlying  structure  of  the  counting  system,  thereby 
contributing  to  greater  variation  in  procedural  counting  in  these 


children.  Thus,  the  impact  of  languages  on  structuring  the  number 
system  in  different  cultures  may  explain  the  divergent  findings 
regarding  procedural  counting  across  studies. 

It  was  demonstrated  in  the  present  study  that  conceptual  knowl¬ 
edge  of  counting  was  a  stronger  predictor  than  procedural  counting 
of  all  measures  of  mathematical  achievement.  The  ceiling  effect 
may  be  one  of  the  explanations  of  the  results.  Another  possible 
interpretation  is  that  conceptual  knowledge  of  counting  is  more 
important  than  procedural  counting  in  children’s  mathematics 
learning.  In  the  present  study,  conceptual  counting  was  measured 
by  a  task  that  required  children  to  identify  incorrect  ways  of 
counting  and  to  coordinate  their  knowledge  of  various  counting 
principles  to  determine  the  cardinal  value  of  a  set.  It  has  been 
argued  that  it  is  necessary  for  children  to  coordinate  different 
counting  principles  to  understand  the  logic  of  numbers  (Nunes  & 
Bryant,  2015).  On  the  basis  of  the  mathematical  thinking  perspec¬ 
tive,  counting  involves  not  only  the  memorization  of  the  number 
words  in  a  fixed  order  but  also  the  understanding  of  how  number 
labels  are  generated  to  surpass  simple  memorization  of  labels.  It 
has  been  suggested  the  reason  that  counting  ability  makes  contri¬ 
butions  to  explaining  variation  in  mathematical  achievement  is  that 
it  helps  children  reflect  on  the  relations  between  quantities  and 
numbers  (e.g.,  Piaget,  1952;  Piaget  &  Inhelder,  1975;  Nunes  & 
Bryant,  2015).  Thus,  if  a  child  can  only  generate  number  words 
proficiently  but  fails  to  understand  the  logic  of  counting,  she  or  he 
is  not  likely  to  do  well  in  mathematics  according  to  the  mathe- 
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Table  20 

The  Additional  Amount  of  Variance  of  Time  2  Story  Problem  Solving  Explained  by  Working 
Memory  Beyond  Time  1  (Tl)  Story  Problem  Solving  and  All  the  Other  Factors  (N  =  115) 


Model 

Variables  entered  into  model 

R2 

R2  change 

F  change 

Significant 

F  change 

(df) 

1 

Age  in  months 

.008 

.008 

.865 

.354 

(1,  113) 

2 

Age  in  months 

Nonverbal  intelligence 

.042 

.034 

4.032* 

.047 

(1,  H2) 

3 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 

.574 

.532 

138.703*** 

<.001 

(1,  HI) 

4 

Age  in  months 

Non-verbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 

.580 

.006 

.787 

.458 

(2,  109) 

5 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 

.642 

.062 

9.29*** 

<.001 

(2,  107) 

6 

Age  in  months 

Nonverbal  intelligence 

Tl  story  problem  solving 
Procedural  counting 

Counting  knowledge 
Commutativity  knowledge 
Complement  knowledge 
Central  executive 
Phonological  loop 
Visuospatial  sketchpad 

.670 

.028 

3.791* 

.047 

(3,  104) 

*  Significant  at  the  .05  level.  ***  Significant  at  the  .001  level. 


matical  thinking  perspective.  Consistent  with  this  view,  the  find¬ 
ings  of  this  study  show  that  conceptual  knowledge  of  counting  had 
a  stronger  connection  than  procedural  counting  with  both  calcula¬ 
tion  and  story  problem  solving  concurrently  and  longitudinally. 

Thus,  the  present  study  adds  to  the  literature  that  individual 
differences  in  conceptual  knowledge  of  counting  may  matter 
more  than  procedural  counting  in  mathematical  achievement. 
This  finding  has  several  implications.  First,  from  a  methodolog¬ 
ical  perspective,  this  evidence  suggests  that  conceptual  knowl¬ 
edge  of  counting  may  be  a  better  measure  of  counting  ability 
than  procedural  counting,  especially  for  children  who  speak  a 
language  in  which  the  organization  of  number  words  fits  well 
with  the  base- 10  system.  Second,  from  an  educational  view¬ 
point,  the  finding  suggests  that  learning  the  numerical  symbols 
of  counting  by  themselves  is  not  sufficient  for  children  to 
succeed  in  mathematics.  Past  evidence  showed  that  there  could 
be  a  disconnection  between  using  numbers  and  understanding 
the  logic  of  counting  (e.g.,  Bermejo  et  al.,  2004;  Freeman  et  al., 
2000;  Sarnecka  &  Gelman,  2004;  Sophian,  1988).  Thus,  teach¬ 
ers  and  parents  need  to  ensure  that  children  learn  not  only  to 
count  fluently,  but  also  learn  to  think  about  the  logical  connec¬ 
tions  of  the  numbers  they  use  for  counting  with  quantities. 

Contributions  of  Additive  Reasoning 

The  second  hypothesis  of  this  study  is  that  additive  reasoning 
is  independent  from  and  more  important  than  working  memory 


and  counting  ability  in  children’s  mathematics  learning.  This 
hypothesis  is  strongly  supported  by  the  findings.  Consistent 
with  this  hypothesis,  additive  reasoning  was  shown  to  make 
independent  contributions  to  explaining  variance  in  calculation 
and  story  problem  solving  beyond  and  above  the  effects  of  age, 
IQ,  working  memory,  and  counting  ability  at  both  waves  of 
assessments.  The  regression  analyses  showed  that  the  additional 
amount  of  variance  explained  by  additive  reasoning  beyond  all 
the  other  factors  was  substantial  (close  to  30%  for  both  calcu¬ 
lation  and  story  problem  solving). 

On  the  basis  of  Piaget’s  logical  operations  framework,  some 
researchers  (e.g.,  Nunes  &  Bryant,  1996,  2015;  Thompson, 
1993,  1994;  Vergnaud,  1997,  2009)  have  argued  that  children’s 
competencies  to  reason  about  quantities  logically  are  of  primary 
importance  for  mathematical  development.  In  the  domain  of 
additive  reasoning,  it  is  important  to  understand  that  quantities 
are  connected  by  part-whole  relations.  Two  central  properties 
of  part-whole  relations  involve  (a)  commutativity  and  (b)  the 
inverse  relation  between  addition  and  subtraction.  Commuta¬ 
tivity  refers  to  the  irrelevance  of  addend  order  to  the  sum,  that 
is,  “a  +  b  —  c”  implies  “b  +  a  =  c,”  whereas  the  complement 
principle  refers  to  the  inverse  relation  between  addition  and 
subtraction,  that  is,  “a  +  b  =  c”  implies  “c  —  a  =  b.”  A  few 
studies  have  shown  that  global  measures  of  quantitative  reason¬ 
ing  are  main  predictors  of  children’s  later  mathematical 
achievements  (Nunes  et  al.,  2007,  2012;  Stern,  2005).  The 
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findings  of  the  present  study  extend  these  results  to  a  sample  of 
non-Caucasian  children  and  establish  that  reasoning  about  part- 
whole  relations  in  particular  is  critical  for  success  in  both 
calculation  and  story  problem  solving. 

Another  important  and  novel  finding  of  this  research  is  that 
when  the  effects  of  T1  mathematical  achievement  were  controlled 
for,  the  influence  of  additive  reasoning  on  T2  mathematical 
achievement  remained  significant.  This  evidence  is  important  be¬ 
cause  it  shows  that  what  additive  reasoning  had  in  common  with 
mathematical  achievement  at  T1  did  not  explain  their  longitudinal 
predictive  power.  The  fact  that  additive  reasoning  remained  a 
significant  longitudinal  predictor  of  variance  after  children’s  per¬ 
formance  in  mathematical  achievement  at  T1  was  controlled  for 
suggests  that  the  predictive  value  of  additive  reasoning  is  very 
strong. 

This  study  also  demonstrated  a  strong  specificity  of  the  additive 
reasoning  tasks.  The  tasks  were  intended  to  measure  the  mathe¬ 
matical  reasoning  ability  of  children,  but  it  is  also  possible  that 
children  need  to  rely  heavily  on  other  skills,  such  as  general 
cognitive  resources  to  complete  the  tasks.  In  other  words,  the  tasks 
may  not  just  measure  mathematical  reasoning,  but  reasoning  in 
general.  These  two  possibilities  were  ruled  out  by  two  findings. 
First,  the  scores  of  all  tasks  of  additive  reasoning  did  not  correlate 
significantly  with  IQ,  working  memory,  and  counting  ability. 
Thus,  it  appears  that  the  additive  reasoning  tasks  did  not  load  on 
these  other  cognitive  competence.  Second,  if  they  measure  general 
reasoning  ability,  rather  than  mathematical  reasoning  ability,  they 
should  correlate  significantly  with  the  scores  on  Chinese  word 
reading.  The  result  showed  that  there  was  no  significant  correlation 
between  additive  reasoning  and  word  reading.  These  results  sug¬ 
gest  that  (a)  the  tasks  were  tapping  a  specific  aspect  of  reasoning, 
that  is,  additive  reasoning,  and  that  (b)  additive  reasoning  predicts 
mathematics  because  it  is  a  measure  of  competence  specific  to 
mathematics  learning. 

Why  is  there  a  strong  link  between  additive  reasoning  and 
mathematical  achievement?  One  possibility  derives  from  consid¬ 
ering  the  contributions  of  additive  reasoning  to  the  understanding 
of  the  nature  of  number  and  the  use  of  more  efficient  problem 
solving  strategies.  According  to  the  mathematical  thinking  per¬ 
spective,  arithmetic  is  the  study  and  use  of  relations  between 
numbers  to  solve  problems  and  this  is  always  carried  out  using  a 
number  system,  which  has  specific  characteristics.  From  this  per¬ 
spective,  arithmetic  is  not  just  about  memorizing  number  facts. 
Instead,  the  process  of  calculation  requires  a  deep  understanding  of 
number  and  of  relations  between  operations.  This  understanding 
may  form  the  basis  for  developing  more  advanced  computational 
strategies  that  help  children  modify  complex  problems  to  make 
them  easier  to  solve  (e.g.,  Canobi,  2004;  Canobi  et  ah,  2003; 
Fuson,  1990;  Gilmore  &  Bryant,  2006;  Nunes  &  Bryant,  1996, 
2015). 

For  example,  some  efficient  computational  strategies  (Gaschler, 
Vaterrodt,  Frensch,  Eichler,  &  Haider,  2013;  Shrager  &  Siegler, 
1998),  such  as  counting  all  starting  with  the  larger  addend  and 
counting  on  from  the  larger  addend,  require  the  understanding  that 
numerical  order  does  not  affect  the  outcome  in  addition  (i.e.,  the 
commutativity  principle).  The  commutativity  knowledge  may  also 
relate  to  the  development  of  other  strategies,  such  as  the  “10 
strategy”  and  “addends-compare  strategy.”  The  10  strategy  refers 
to  individuals’  reordering  different  addends  within  a  problem  in  an 


attempt  to  exploit  the  circumstance  that  nonadjacent  numbers  add 
up  to  10.  For  instance,  children  who  understand  the  commutativity 
principle  can  transform  the  problem  “3  +  6  +  7”  into  “(3  +  7)  + 
6”  that  is  easier  for  them  to  solve.  For  some  arithmetic  problems, 
computation  can  become  unnecessary  if  one  recognizes  that  the 
identical  addends  had  been  shown  (though  in  different  order  e.g., 
“2  +  1  +  8”)  in  a  previous  problem  that  had  already  been  solved 
for  example,  “8  H-  7  +  2.”  This  addends-compare  strategy  also 
demands  the  application  of  the  commutativity  knowledge  between 
problems  (Gaschler  et  al.,  2013). 

The  complement  principle  may  contribute  to  the  use  of  a  strat¬ 
egy  called  “indirect  addition,”  in  which  children  can  use  additions 
to  solve  subtraction  problems  effectively  if  the  numbers  are  close 
to  each  other.  For  example,  to  solve  “21  -  18,”  it  is  less  likely  to 
make  mistakes  if  they  count  up  from  18  to  21.  Thus,  the  use  of 
more  advanced  computational  strategies  may  be  one  of  the  reasons 
that  children  with  better  understanding  of  the  commutativity  and 
complement  principles  performed  better  on  the  calculation  tasks. 

Knowledge  of  these  principles  may  also  help  the  children  ana¬ 
lyze  the  mathematical  situations  presented  in  story  problems  more 
effectively.  According  to  the  mathematical  thinking  perspective, 
learning  mathematics  should  be  based  on  understanding  the  rela¬ 
tions  between  quantities  and  operating  on  the  numbers  to  reach 
conclusions  about  the  quantities.  Story  problems  are  texts  that 
involve  information  about  quantities,  which  typically  “describe(s) 
a  situation  assumed  familiar  to  the  reader  and  pose(s)  a  quantita¬ 
tive  question,  an  answer  to  which  can  be  derived  by  mathematical 
operations  performed  on  the  data  provided  in  the  text,  or  otherwise 
inferred”  (Greer,  Verschaffel,  &  De  Corte,  2002,  p.  271).  Solving 
an  additive  story  problem  has  been  viewed  as  selecting  and  acti¬ 
vating  appropriate  cognitive  schema  and  filling  the  empty  “slots” 
of  the  activated  schema  with  information  provided  in  the  story  text. 
Some  of  these  problem  types  (e.g.,  result-unknown  change  prob¬ 
lems  and  total  set-unknown  combine  problems)  are  suggested  to 
link  easily  to  counting  or  calculation  schemes  readily  available  in 
individuals’  cognitive  repertoire  (Carpenter  et  al.,  1981;  De  Corte, 
&  Verschaffel,  1985,  1987;  Ginsburg,  1982).  Other  more  difficult 
problems  (e.g.,  start-unknown  change  problems)  require  additional 
re-representational  steps  that  involve  the  application  of  the  part- 
whole  schema  before  a  connection  with  a  proper  counting  or 
operation  scheme  could  be  formed.  The  understanding  of  the 
inverse  relation  between  addition  and  subtraction  (the  complement 
principle)  and  the  commutativity  nature  of  quantities  may  help 
children  reason  about  the  underlying  structure  of  the  quantitative 
relations  in  the  story. 

Knowledge  of  the  commutativity  principle  may  also  be  related 
to  children’s  solving  some  missing  addend  problems  (Nunes  & 
Bryant,  2015).  Consider  this  example:  “Jane  had  three  cookies,  got 
some  more  and  now  has  seven.  How  many  more  cookies  did  she 
get?”  Children  can  easily  solve  this  problem  by  representing  the 
first  addend  with  three  fingers,  counting  up  to  the  final  state  that  is, 
seven  fingers,  and  evaluated  how  many  fingers  they  had  to  add  in 
the  process.  However,  if  the  problem  has  the  first  rather  than  the 
second  addend  missing,  for  example,  “Jane  had  some  cookies;  her 
mother  gave  her  four  more  and  now  she  has  seven.  How  many  did 
she  have  to  start  with?”  the  children  have  to  understand  that  the 
order  does  not  affect  the  total.  Those  who  understand  the  commu¬ 
tativity  principle  can  start  from  the  second  addend  that  is,  4,  add  up 
to  7,  and  count  how  many  were  added.  Children  who  do  not 
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understand  commutativity  may  find  this  problem  difficult  to  solve 
because  they  do  not  know  how  many  cookies  Jane  to  start  with. 

Contributions  of  Working  Memory 

Learning  and  using  mathematics,  including  thinking  mathemat¬ 
ically,  must  draw  on  some  general  cognitive  resources,  such  as 
working  memory.  Thus,  it  was  hypothesized  that  working  memory 
makes  a  contribution  to  mathematical  achievement,  even  when  one 
has  accounted  for  children’s  specific  mathematical  knowledge 
such  as  their  knowledge  of  counting  ability  and  additive  reasoning. 
Consistent  with  this  hypothesis,  working  memory  continued  to 
account  for  a  significant  amount  of  variance  in  all  measures  of 
mathematical  achievement  after  all  the  other  factors  were  con¬ 
trolled  for.  This  suggests  that  working  memory  is  a  stable  factor 
that  contributes  to  children’s  mathematical  achievement  from  the 
first  to  second  grade.  In  contrast  to  additive  reasoning,  working 
memory  was  not  a  specific  predictor  for  mathematical  achieve¬ 
ment  because  there  was  also  a  significant  correlation  between 
working  memory  and  Chinese  word  reading. 

Among  the  three  components  of  working  memory,  the  central 
executive  appeared  to  be  a  stronger  predictor  than  phonological 
loop  and  visuospatial  sketchpad.  The  central  executive  was  found 
to  be  a  unique  and  significant  predictor  of  variations  in  calculation 
and  story  problem  solving  at  both  time  points.  By  contrast,  visu¬ 
ospatial  sketchpad  did  not  correlate  with  mathematical  achieve¬ 
ment  at  all,  whereas  phonological  loop  did  not  make  a  unique 
contribution  to  mathematical  achievement  when  the  effect  of  cen¬ 
tral  executive  is  taken  into  account.  The  finding  is  consistent  with 
previous  research  that  shows  that  measures  of  the  central  executive 
are  especially  strong  predictors  of  children’s  performance  in  math¬ 
ematics  (e.g.,  Cowan  &  Powell,  2014;  Gathercole  &  Pickering, 
2000;  Holmes  &  Adams,  2006;  Keeler  &  Swanson,  2001;  Lee  et 
al.,  2004;  Lehto,  1995;  Noel  et  al.,  2004;  Swanson  &  Beebe- 
Frankenberger,  2004;  Wilson  &  Swanson,  2001).  Most  of  these 
studies  have  measured  central  executive  by  memory  span  tasks 
that  demand  simultaneous  monitoring  and  storage  of  information. 
The  evidence  suggests  that  the  particular  central  executive  func¬ 
tion  of  monitoring  and  coordinating  concurrent  processing  and 
storage  of  information  is  crucial  for  children’s  performance  on 
mathematical  tasks.  From  the  mathematical  thinking  perspective, 
the  central  executive  may  support  children  to  think  about  the 
relations  between  numbers,  to  make  a  decision  about  appropriate 
strategy  use  to  calculate,  and  then  allocate  attentional  resources  to 
implement  the  selected  strategy.  One  study  showed  that  the  central 
executive  component,  rather  than  the  phonological  loop  and  visu¬ 
ospatial  sketchpad,  was  associated  with  strategy  use  in  calculation 
(Wu  et  al.,  2008).  When  solving  story  problems,  the  central  exec¬ 
utive  may  also  support  children  to  reason  about  the  underlying 
quantitative  structure  of  story  problems,  to  identify  the  operations 
required  to  solve  the  problem,  while  working  out  the  solution. 

The  finding  that  visuospatial  sketchpad  did  not  correlate  with 
mathematical  achievement  in  this  study  may  be  explained  by  the 
age  of  the  participants.  Some  studies  suggest  that  preschool  chil¬ 
dren  tend  to  have  better  performance  on  nonverbal  compared  with 
verbal  arithmetic  tasks  and  that  individual  differences  in  visuospa¬ 
tial  sketchpad  are  the  best  predictor  of  mathematical  achievement 
in  this  age  group  (McKenzie,  Bull,  &  Gray,  2003;  Rasmussen  & 
Bisanz,  2005;  Simmons,  Chris,  &  Home,  2008).  However,  it  has 


been  argued  that  from  primary  school  onward,  children  become 
increasingly  reliant  on  verbal  rehearsal  to  retain  materials  in  mem¬ 
ory  (Hitch,  Halliday,  Schaafstal,  &  Schraagen,  1988).  Consistent 
with  this  idea,  Rasmussen  and  Bisanz  (2005)  showed  that  by  the 
first  grade,  children  performed  equally  well  on  verbal  and  nonver¬ 
bal  mathematical  tasks,  and  that  phonological  loop  became  the  best 
predictor  of  children’s  performance  on  verbal  problems  (Rasmus¬ 
sen  &  Bisanz,  2005).  The  nonsignificant  correlation  between  visu¬ 
ospatial  sketchpad  and  mathematical  achievement  in  the  present 
study  also  suggests  that  this  component  of  working  memory  may 
not  be  important  for  the  children  in  this  study  at  the  age  of  around 
6  to  7  to  perform  calculation  and  to  solve  story  problems. 

The  present  study  also  showed  that  the  phonological  loop  sig¬ 
nificantly  correlated  with  calculation  at  both  time  points,  suggest¬ 
ing  that  it  may  be  important  for  mathematics  learning  in  this  age 
group.  However,  it  was  not  a  unique  predictor  when  the  effect  of 
central  executive  is  controlled  for.  One  straightforward  interpreta¬ 
tion  is  that  central  executive  is  more  important  than  phonological 
loop  in  children’s  mathematics  learning.  However,  it  has  to  be 
noted  that  the  central  executive  component  of  working  memory 
was  assessed  by  “counting  span”  and  “backward  digit  span”  tasks. 
Thus,  both  the  central  executive  and  phonological  loop  tasks  may 
draw  on  verbal  processing  of  materials  (Savage,  Lavers,  &  Pillay, 
2007).  Thus,  this  may  reduce  the  likelihood  that  a  unique  associ¬ 
ation  between  phonological  loop  and  mathematical  achievement  is 
observed  in  a  regression  model  in  which  central  executive  mea¬ 
sures  are  also  included.  Future  research  may  explore  ways  to 
measure  the  central  executive  component  of  working  memory 
nonverbally  and  examine  its  association,  relative  to  the  phonolog¬ 
ical  loop,  with  mathematical  achievement. 

Theoretical  Implications 

In  the  introduction,  two  theoretical  perspectives  on  mathematics 
learning  were  highlighted  and  compared,  namely  the  number  sense 
perspective  and  the  mathematical  thinking  perspective.  The  pres¬ 
ent  study  was  designed  on  the  basis  of  the  latter  perspective  that 
focuses  on  how  children  think  about  mathematics  logically  and 
meaningfully.  According  to  this  view,  one  core  intellectual  de¬ 
mand  to  learn  mathematics  is  the  need  to  understand  relations 
between  quantities,  rather  than  merely  understanding  things  in 
isolation.  For  example,  Nunes  and  Bryant  (2015)  propose  that 
there  are  two  meanings  of  number — analytical  and  representa¬ 
tional.  The  analytical  meaning  of  number  is  defined  by  a  number 
system,  whereas  the  representational  meaning  refers  to  the  use  of 
numbers  to  represent  quantities.  A  child  who  is  competent  in 
mathematical  thinking  means  that  she  or  he  has  a  good  understand¬ 
ing  of  the  relational  meanings  of  numbers  and  quantities.  This 
understanding  appears  to  support  his  or  her  ability  to  excel  in  a 
variety  of  mathematical  tasks. 

The  findings  of  this  study  strongly  suggest  that  the  mathematical 
thinking  perspective  is  an  excellent  theoretical  framework  for 
understanding  mathematics  learning  and  education.  The  final  re¬ 
gression  models  that  combine  all  factors  hypothesized  to  relate  to 
mathematical  achievement  explained  over  50%  of  the  variance  in 
calculation  and  story  problem  solving,  both  concurrently  and  lon¬ 
gitudinally.  In  particular,  conceptual  knowledge  of  counting  is 
more  important  than  procedural  counting  in  predicting  children’s 
mathematical  achievement  in  calculation.  Additive  reasoning,  as 
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measured  by  the  knowledge  of  the  commutativity  and  complement 
principles,  explained  variance  in  T2  mathematical  achievement 
that  was  not  accounted  for  by  T1  mathematical  achievement  and 
all  the  other  factors. 

Educational  Implications 

One  educational  implication  of  this  study  is  that  quantitative 
reasoning  should  be  a  central  aspect  addressed  in  mathematics 
education  curricula.  Children  need  to  learn  to  reason  about  rela¬ 
tions  between  quantities  to  solve  problems,  not  only  about  arith¬ 
metic. 

A  traditional  assumption  in  early  mathematics  education  is  that 
knowledge  of  arithmetic  comes  first.  Quantitative  reasoning  is 
usually  introduced  only  after  children  have  learned  arithmetic.  The 
assumption  behind  this  practice  is  that  children  will  learn  to  apply 
the  acquired  formal  arithmetic  operations  to  deal  with  various 
kinds  of  problem  situations. 

However,  it  seems  common  to  observe  that  children  learn  com¬ 
putational  algorithms  in  a  meaningless  fashion.  For  instance,  pre¬ 
vious  studies  showed  that  children  often  encountered  difficulties  in 
solving  multidigit  addition  and  subtraction  (e.g.,  Brown  &  Burton, 
1978;  Brown  &  VanLehn,  1982;  Carpenter,  Franke,  Jacobs,  Fen- 
nema,  &  Empson,  1997;  Carraher  &  Schliemann,  1985;  Carraher, 
Carraher,  &  Schliemann,  1985;  Fuson,  1990;  Hennessy,  1994; 
Hiebert  &  Weame,  1996;  Resnick,  1982,  1992,  1994;  Selter,  2001; 
Young  &  O’Shea,  1981).  Their  difficulties  can  be  understood  in 
terms  of  the  implementation  of  faulty  procedures  (Brown,  & 
VanLehn,  1982).  For  example,  when  calculating  237  —  49,  the 
children  obtain  the  answer  212  by  taking  7  away  from  9  and  3 
away  from  4,  presumably  because  they  assume  that  one  cannot 
take  a  larger  number  from  a  smaller  number.  Another  example  of 
faulty  procedures  is  that  when  facing  a  subtraction  such  as  607  — 
8,  the  children  obtain  699,  by  subtracting  8  from  17.  These 
children  have  correctly  borrowed  and  added  to  the  ones  column, 
making  the  0  into  a  9  because  1  had  been  borrowed  from  the  tens 
column.  However,  they  forgot  that  something  had  been  borrowed 
from  the  hundreds  column.  These  are  typical  faulty  procedures  and 
well  known  to  primary  school  teachers.  Brown  and  VanLehn 
(1982)  suggested  that  they  are  not  merely  a  result  of  lack  of 
attention.  Instead,  the  mistakes  follow  from  a  systematic  applica¬ 
tion  of  erroneous  algorithms  across  different  kinds  of  problems  by 
the  same  children.  The  mathematical  thinking  approach  suggests 
that  if  children  do  not  have  a  clear  understanding  of  analytical 
meaning  of  number  that  is,  the  relation  between  numbers,  they  are 
more  likely  to  make  calculation  mistakes  because  of  the  use  of 
faulty  procedures. 

Thus,  to  search  for  meaningful  mathematics  teaching,  educators 
should  find  ways  to  keep  teaching  connected  to  quantities  in  the 
world.  One  of  the  ways  to  achieve  this  is  to  avoid  a  predominant 
focus  on  learning  procedures  without  any  connection  to  under¬ 
standing  or  applications  that  require  these  procedures.  Additive 
reasoning  should  be  considered  as  a  domain  of  teaching  and 
learning  on  its  own  right  and  numbers  should  not  be  taught  in 
isolation  from  quantities  and  relations  from  the  start.  Children  may 
also  be  given  simple  representational  tools,  such  as  blocks  and 
diagrams  that  represent  information  about  relations  to  solve  prob¬ 
lems,  before  they  are  taught  about  formalizations.  Research  has 
identified  a  number  of  ways  to  promote  quantitative  reasoning. 


One  simple  way  to  do  so  is  to  engage  children  to  reflect  and 
discuss  about  the  problem.  For  example,  Bermejo  et  al.  (2004) 
showed  that  children  who  were  asked  to  discuss  what  was  the 
number  of  objects  in  a  set  when  the  counting  was  carried  out 
backward  made  significant  improvement  in  tasks  where  counting 
was  done  in  a  nonconventional  way,  such  as  counting  from  two. 
This  study  suggests  that  reflection  and  discussion  could  be  one  of 
the  strategies  that  educators  can  use  to  promote  the  coordination  of 
counting  principles. 

On  the  basis  of  Piaget’s  (1952)  theory,  some  mathematics  edu¬ 
cation  researchers  (e.g.,  Nunes  &  Bryant,  1996;  Steffe,  1994; 
Steffe  &  Thompson,  2000;  Vergnaud,  2009)  suggest  that  educators 
should  focus  on  helping  children  form  schemes  of  action  to  un¬ 
derstand  different  types  of  situations.  Schema  based  instruction  in 
problem  solving  may  represent  another  promising  way  to  promote 
quantitative  reasoning  (e.g.,  Chen,  1999;  Fuchs,  Fuchs,  Finelli, 
Courey,  &  Hamlett,  2004;  Jitendra  &  Hoff,  1996;  Marshall,  1995). 
The  main  idea  behind  this  way  of  teaching  is  that  children  can 
learn  to  classify  problems  into  problem  types  and  design  a  path  to 
solution  on  the  basis  of  what  they  know  is  similar  to  a  particular 
problem.  For  example,  teachers  may  first  present  some  prototyp¬ 
ical  problems  in  lessons  and  exemplify  the  paths  to  solution.  The 
students  are  then  asked  to  model  the  solutions.  This  teaching 
strategy  encourages  students  to  identify  analogous  problems  and 
thus  resort  to  similar  pathways  to  solution.  Because  the  classifi¬ 
cation  of  problems  requires  teachers  to  have  knowledge  about 
what  kinds  of  reasoning  is  required  to  solve  problems  in  a  partic¬ 
ular  situation,  it  is  important  for  teacher  education  programs  to 
ensure  teachers  to  become  knowledgeable  about  the  ways  through 
which  different  problem  situations  are  classified,  such  as  based  on 
different  schemes  of  action  for  different  situations. 

Limitations  and  Future  Directions 

With  regard  to  the  limitations  of  this  study,  some  suggestions  for 
future  directions  are  made  in  this  section.  First,  the  present  study 
has  employed  a  longitudinal  design  that  does  not  allow  us  to 
determine  the  causal  relation  between  variables.  To  establish 
whether  additive  reasoning  and  mathematical  achievement  are  in  a 
causal  relation,  both  longitudinal  and  intervention  studies  have  to 
be  used  (Bradley  &  Bryant,  1983).  The  present  study  shows  that 
additive  reasoning  made  an  independent  contribution  to  explaining 
individual  differences  in  mathematical  achievement  beyond  and 
above  working  memory  and  counting  ability.  This  finding  ad¬ 
dresses  the  first  step  of  the  paradigm  to  determine  whether  additive 
reasoning  is  a  potential  cause  of  children’s  mathematical  achieve¬ 
ment.  A  possible  research  project  in  the  future  may  focus  on 
implementing  intervention  programs  aimed  to  enhance  children’s 
additive  reasoning  and  examining  whether  an  improvement  in 
additive  reasoning  would  result  in  significant  progress  in  mathe¬ 
matics  learning.  Second,  although  the  present  study  shows  that  the 
mathematical  thinking  perspective  is  a  useful  theoretical  frame¬ 
work  for  understanding  mathematics  learning  and  education,  one 
cannot  draw  any  conclusion  regarding  whether  it  is  a  better  per¬ 
spective  than  the  number  sense  approach.  To  test  this  research 
question,  future  studies  may  incorporate  measures  of  number  sense 
(e.g.,  numerical  magnitude  comparisons,  number  facts,  number 
line  estimation)  to  evaluate  the  relative  importance  of  number 
sense  and  quantitative  reasoning. 
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Conclusion 

In  conclusion,  the  present  study  has  provided  some  evidence 
regarding  the  contributions  of  working  memory,  counting  ability, 
and  additive  reasoning  to  children’s  mathematical  achievement. 
This  study  is  guided  by  the  mathematical  thinking  perspective  that 
emphasizes  the  importance  of  understanding  the  relations  between 
quantities  in  mathematics  learning  Conceptual  knowledge  of 
counting,  but  not  procedural  counting,  was  a  unique  predictor  of 
calculation  ability  beyond  age,  IQ,  and  working  memory,  but  it  did 
not  contribute  significantly  to  story  problem  solving.  It  was  found 
that  the  central  executive  component  of  working  memory  made 
independent  contributions  to  explaining  variations  in  calculation 
and  story  problem  solving  beyond  the  effects  of  all  the  other 
factors.  Additive  reasoning  (as  assessed  by  knowledge  of  commu¬ 
tativity  and  the  complement  principle)  was  shown  to  be  more 
important  than  counting  ability  and  working  memory  for  chil¬ 
dren’s  mathematics  learning.  It  appears  to  be  an  independent  and 
the  strongest  predictor  of  children’s  mathematical  achievement. 
Despite  several  limitations,  this  study  offers  some  novel  and  ex¬ 
citing  directions  for  more  empirical  investigations  in  mathematics 
learning  and  education  with  a  focus  on  quantitative  reasoning  in 
the  future. 
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The  importance  of  fraction  knowledge  to  later  mathematics  achievement,  along  with  U.S.  students’  poor 
knowledge  of  fraction  concepts  and  procedures,  has  prompted  research  on  the  development  of  fraction 
learning.  In  the  present  study,  participants’  ( N  =  536)  development  of  fraction  magnitude  understanding 
and  fraction  arithmetic  skills  was  assessed  over  4  time  points  between  4th  and  6th  grades.  Latent 
state-trait  modeling  was  used  to  examine  codevelopment  of  these  2  areas  of  fraction  knowledge.  Fraction 
arithmetic  skill  predicted  later  fraction  magnitude  understanding,  and  conversely,  fraction  magnitude 
understanding  predicted  later  fraction  arithmetic  skill.  The  results  are  consistent  with  a  bidirectional 
model  of  the  development  of  fraction  concepts  and  procedures,  in  which  knowledge  of  one  type 
facilitates  learning  of  the  other  type.  However,  transfer  in  both  directions  between  fraction  arithmetic 
skill  and  fraction  magnitude  understanding  was  more  likely  to  occur  later  in  the  development  of  fraction 
knowledge,  after  fraction  arithmetic  with  unlike  denominators  had  been  taught  in  school  (during  5th 
grade  in  the  current  sample).  Furthermore,  the  effects  of  previous  knowledge  of  the  other  type  were  small 
and  not  nearly  as  substantial  as  the  effects  of  previous  knowledge  on  later  knowledge  of  the  same  type. 
Findings  suggest  a  need  for  instiuction  to  link  fraction  magnitude  understanding  to  fraction  arithmetic 
skill  and  vice  versa. 

Keywords:  mathematics  achievement,  state-trait  models,  conceptual  knowledge,  procedural  knowledge, 
fractions,  transfer 


Children’s  fraction  knowledge  is  foundational  to  their  later 
mathematics  achievement  (Bailey,  Hoard,  Nugent,  &  Geary,  2012; 
Booth  &  Newton,  2012;  Siegler  et  al.,  2012).  Unfortunately,  many 
students  struggle  with  fraction  concepts  and  procedures  (Byrnes  & 
Wasik,  1991;  Hecht  &  Vagi,  2010;  Siegler  &  Pyke,  2013;  Siegler, 
Thompson,  &  Schneider,  2011;  Smith,  Solomon,  &  Carey,  2005; 
Stafylidou  &  Vosniadou,  2004).  Fraction  knowledge  is  essential 
both  in  everyday  life  as  well  as  for  learning  more  advanced 
mathematics  and  science  (National  Mathematics  Advisory  Panel 
[NMAP],  2008;  Siegler  et  al„  2012). 

Knowledge  in  any  mathematical  domain  depends  on  accurate 
understandings  of  both  procedures  (step-by-step  sequences  used  to 
solve  problems)  and  concepts  (generalizable  ideas  and  principles 
that  govern  a  domain;  Geary,  2004).  The  domain  of  fractions  is  no 
exception.  Specifically,  fraction  procedures  involve  arithmetic  op¬ 
erations  with  fractions  (e.g.,  adding  or  multiplying  two  fractions). 
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Fraction  concepts,  on  the  other  hand,  involve  understanding  the 
meaning  of  a  fraction,  such  as  how  the  numerator  and  the  denom¬ 
inator  work  together,  and  relations  between  and  among  fractions. 
Understanding  that  fractions  have  magnitudes  that  can  be  ordered 
on  the  number  line  has  been  implicated  as  a  particularly  important 
concept  underlying  children’s  low  performance  on  conventional 
tests  of  fraction  knowledge  (Siegler  &  Lortie-Forgues,  2014;  Sieg¬ 
ler  et  al.,  2011).  Although  fraction  magnitude  understanding  is  not 
the  only  facet  of  fraction  conceptual  knowledge,  it  provides  a 
unifying  structure  for  learning  other  fraction  concepts  (Hecht, 
1998;  Siegler  et  al.,  2011). 

Children’s  understandings  of  fraction  magnitudes  and  frac¬ 
tion  arithmetic  skill  have  been  hypothesized  to  develop  itera¬ 
tively,  with  knowledge  of  one  type  enhancing  children’s  learn¬ 
ing  of  the  other  type.  However,  the  evidence  for  this  hypothesis 
is  mixed,  partially  because  of  variability  in  previous  nonexperi- 
mental  methods  used  to  test  this  hypothesis.  The  purpose  of  the 
present  study  is  to  reconcile  mixed  findings  from  previous  work 
and  to  present  a  more  rigorous  nonexperimental  test  of  the 
processes  underlying  the  codevelopment  of  children’s  knowl¬ 
edge  of  fraction  magnitudes  and  fraction  arithmetic.  Below,  we 
describe  previous  theoretical  and  empirical  work  on  the  devel¬ 
opment  of  children’s  fraction  magnitude  knowledge  and  frac¬ 
tion  arithmetic  skill,  and  argue  how  the  current  study  addresses 
plausible  threats  to  internal  validity  that  have  limited  findings 
from  other  studies. 
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The  Path  From  Magnitude  Knowledge  to 
Arithmetic  Skills 

The  link  between  children’s  understanding  of  numerical  mag¬ 
nitudes  and  their  learning  of  arithmetic  has  strong  theoretical 
underpinnings.  Children  appear  to  access  magnitude  representa¬ 
tions  when  calculating  with  whole  numbers.  For  example,  children 
take  less  time  to  reject  errors  on  arithmetic  combinations  that  are 
farther  from  the  correct  answer  compared  to  ones  closer  to  the 
answer  (e.g.,  when  presented  with  the  problems  “4+3=8”  and 
“4+3  =  10,”  children  take  less  time  to  reject  the  latter  as  incorrect; 
Ashcraft,  1982).  Additionally,  there  is  a  correlation  between  the 
magnitudes  of  spontaneous  retrieval  errors  and  correct  answers 
across  whole  number  arithmetic  problems  (e.g.,  incorrect  answers 
to  the  problem  4  +  3  are  likely  to  be  closer,  on  average,  to  8  than 
12,  and  incorrect  answers  to  4  +  9  are  likely  to  be  closer  to  12  than 
8;  Lemaire  &  Siegler,  1995). 

Understanding  numerical  magnitudes  may  serve  a  special  pur¬ 
pose  in  children’s  learning  of  fraction  arithmetic.  Fraction  arith¬ 
metic  procedures  consist  of  a  complex  set  of  subroutines,  which 
children  may  omit  or  confuse  during  learning  (e.g.,  a  child  might 
incorrectly  answer  that  2/5  +  3/5  equals  5/10,  or  that  2/5  X  3/5 
equals  6/5).  Understanding  of  fraction  magnitudes  enables  children 
to  reject  these  answers  (e.g.,  2/5  +  3/5  cannot  equal  5/10,  because 
a  sum  of  two  positive  numbers  cannot  be  less  than  one  of  the 
addends,  3/5)  and  makes  them  less  likely  to  persist  with  flawed 
fraction  arithmetic  procedures  (Siegler  et  al.,  2011). 

Several  empirical  findings  suggest  a  causal  link  between  chil¬ 
dren’s  understanding  of  fraction  magnitudes  and  fraction  arithme¬ 
tic.  For  example,  there  are  substantial  concurrent  and  longitudinal 
relations  between  measures  of  fraction  magnitude  understanding 
and  fraction  arithmetic  skill  (e.g.,  Bailey,  Siegler,  &  Geary,  2014; 
Hallett,  Nunes,  &  Bryant,  2010;  Hansen  et  al.,  2015;  Hecht,  1998; 
Hecht,  Close,  &  Santisi,  2003;  Hecht  &  Vagi,  2010,  2012;  Jordan 
et  al.,  2013;  Siegler  et  al.,  2011),  and  countries  with  children  who 
have  higher  average  levels  of  fraction  magnitude  understanding 
have  been  found  to  have  children  with  higher  fraction  arithmetic 
skill  as  well  (Torbeyns,  Schneider,  Xin,  &  Siegler,  2015).  Chil¬ 
dren’s  understanding  of  fraction  magnitudes  predicts  their  re¬ 
sponse  to  a  fraction  arithmetic  intervention  (Byrnes  &  Wasik, 
1991).  Most  directly  relevant  to  this  hypothesis,  a  successful  set  of 
interventions  focused  on  fraction  magnitude  understanding  has 
shown  strong  effects  on  children’s  fraction  arithmetic  skill  (Fuchs 
et  al.,  2013,  2014),  with  these  effects  statistically  mediated  by 
children’s  increases  in  fraction  magnitude  understanding. 

The  Path  From  Arithmetic  Skills  to 
Magnitude  Knowledge 

Rittle-Johnson,  Siegler,  and  Alibali  (2001)  propose  that  a  bidi¬ 
rectional  process  underlies  the  development  of  children’s  concep¬ 
tual  and  procedural  knowledge  in  mathematics  more  generally,  as 
both  types  of  knowledge  improve  and  rely  on  the  ability  to  repre¬ 
sent  the  structure  of  a  problem  accurately.  There  is  strong  evidence 
that  procedural  knowledge  influences  conceptual  understanding  in 
the  development  of  children’s  whole  number  knowledge  (Canobi, 
2009;  Rittle-Johnson  &  Alibali,  1999).  Children’s  knowledge  of 
fraction  concepts  and  procedures  may  similarly  develop  in  a  bidi¬ 


rectional  manner,  with  fraction  magnitude  understanding  influenc¬ 
ing  fraction  arithmetic  accuracy  and  vice  versa. 

As  children  operate  on  fractions,  it  is  possible  that  they  notice 
links  between  the  magnitudes  of  fractions  they  already  understand 
and  answers  to  problems  in  fraction  form,  with  which  they  might 
be  less  familiar.  For  example,  a  child  might  know  what  1/2  and  3/4 
mean,  understand  that  their  sum  must  lie  between  1  and  1  1/2  on 
a  number  line,  and  (if  the  child  knows  the  procedure  for  adding 
fractions)  that  1/2  +  3/4  can  be  written  as  the  improper  fraction, 
5/4.  Connecting  these  pieces  of  information  may  help  children 
learn  about  the  magnitudes  of  improper  fractions.  In  contrast,  a 
child  who  possesses  basic  knowledge  of  some  fraction  magnitudes 
but  lacks  knowledge  of  the  fraction  addition  procedure  might  fail 
to  learn  more  about  fraction  magnitudes  or  even  harm  their  prior 
fraction  magnitude  understanding  by  practicing  fraction  arithmetic 
(e.g.,  the  student  might  think  that  1/2  +  3/4  is  4/6).  Consistent  with 
the  possibility  of  transfer  from  fraction  arithmetic  knowledge  to 
fraction  magnitude  understanding,  differences  between  Chinese 
and  U.S.  students  in  performance  on  mathematics  tasks  have  been 
found  to  be  larger  on  measures  of  procedural  than  conceptual 
knowledge  (Cai,  1995,  2000;  Torbeyns  et  al.,  2015),  and  adjusting 
for  U.S.-Chinese  differences  in  fraction  arithmetic  accounts  for 
the  entire  Chinese  advantage  in  fraction  magnitude  understanding 
(Bailey  et  al.,  2015).  To  the  extent  that  national  differences  in  one 
domain  (i.e.,  fraction  magnitude  understanding)  result  from  na¬ 
tional  differences  in  another  (i.e.,  fraction  arithmetic),  it  is  possible 
that  the  level  of  proficiency  in  fraction  arithmetic  affects  children’s 
learning  of  fraction  magnitudes. 

However,  some  prior  work  has  failed  to  support  this  hypothe¬ 
sized  path  from  fraction  arithmetic  skill  to  fraction  magnitude 
knowledge.  A  previous  study  that  included  stringent  controls  for 
previous  fraction  and  other  mathematics  knowledge  have  yielded 
nonsignificant  effects  of  fraction  arithmetic  skill  on  later  fraction 
conceptual  understanding,  including  magnitude  knowledge  (Hecht 
&  Vagi,  2010). 

Limitations  of  Previous  Research 

Although  the  reciprocal  development  of  fraction  arithmetic  skill 
and  fraction  magnitude  understanding  has  theoretical  and  indirect 
empirical  support,  previous  studies  on  transfer  between  children’s 
fraction  magnitude  understanding  to  their  learning  of  fraction 
arithmetic  are  limited  in  important  ways.  Most  correlational  stud¬ 
ies  cover  only  a  short  period  of  the  development  of  children’s 
fraction  knowledge  and  do  not  control  for  children’s  prior  perfor¬ 
mance  on  tests  of  the  same  type  of  knowledge.  Studies  that  do 
control  for  children’s  prior  knowledge  face  threats  to  internal 
validity  common  to  the  cross-lagged  panel  design  used  to  analyze 
the  data,  such  as  incomplete  control  for  interindividually  stable 
factors.  These  factors  may  develop  over  time  as  well,  but  generally 
they  show  rank-order  stability  across  children,  influence  learning 
across  development,  and  may  not  be  fully  captured  by  measures  of 
previous  knowledge.  Relatively  stable  differences  may  include 
some  combination  of  domain  general  cognitive  abilities,  environ¬ 
mental  factors,  and  levels  of  specific  previous  knowledge.  Indeed, 
a  recent  analysis  suggests  that  commonly  used  control  variables, 
such  as  intelligence,  working  memory,  and  socioeconomic  status, 
account  for  about  two  thirds  of  the  variance  in  the  stable  interin¬ 
dividual  factors  influencing  the  development  of  children’s  broad 
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mathematics  achievement  over  time  (Bailey,  Watts,  Littlefield,  & 
Geary,  2014).  Correlational  analyses,  such  as  cross-lagged  panel 
models,  that  do  not  account  for  unmeasured  interindividual  differ¬ 
ences  influencing  children’s  earlier  and  later  learning  likely  yield 
inflated  estimated  causal  effects  of  early  knowledge  on  later  learn¬ 
ing  because  of  the  effects  of  these  omitted  variables  (Hamaker, 
Kuiper,  &  Grasman,  2015;  Rogosa,  1980). 

Furthermore,  intervention  studies  have  not  established  clear 
causal  impacts  between  children’s  fraction  arithmetic  skill  and 
fraction  magnitude  understanding.  Interventions  that  have  been 
shown  to  improve  children’s  fraction  arithmetic  skill  by  focusing 
on  their  magnitude  and  other  conceptual  understanding  (Bottge  et 
al.,  2014;  Fuchs  et  al.,  2013,  2014)  involved  procedural  instruction 
as  well,  so  it  is  difficult  to  know  for  sure  what  part  of  the 
intervention  led  to  gains  in  arithmetic  skill.  These  interventions 
have  great  value,  but  the  precise  causal  mechanisms  through  which 
they  work  are  unclear.  Indeed,  although  all  three  interventions 
included  conceptually  rich  instruction,  they  showed  larger  effects 
on  measures  of  children’s  fraction  arithmetic  skill  than  on  tests  of 
their  fraction  magnitude  and  other  aspects  of  conceptual  under¬ 
standing.  Also,  in  a  less  intensive  intervention  designed  to  boost 
children’s  fraction  arithmetic  skill,  children’s  conceptual  knowl¬ 
edge  was  unaffected  (Byrnes  &  Wasik,  1991). 

Possibility  Versus  Likelihood  of  Transfer 

How  might  we  reconcile  the  strong  theory  and  indirect  evidence 
behind  the  hypothesis  that  children’s  fraction  arithmetic  skill  and 
fraction  magnitude  understanding  develop  iteratively  with  mixed 
evidence  from  intervention  studies  and  nonexperimental  studies? 
There  is  an  important  distinction  between  the  types  of  transfer  that 
can  occur  during  children’s  mathematical  development  and  the 
types  of  transfer  that  do  occur  during  children’s  mathematical 
development.  One  might  argue  that,  if  transfer  between  knowledge 
type  x  and  knowledge  type  y  is  possible,  it  logically  follows  that 
we  should  focus  on  improving  x.  However,  demonstrating  that  x 
can  influence  y  is  insufficient.  We  argue  that  an  equally  important 
consideration  is  whether,  given  previous  work  on  teachers’  and 
students’  mean  levels  of  previous  knowledge,  x  is  likely  to  influ¬ 
ence  y.  Indeed,  if  hypothesized  transfer  from  x  to  y  is  ubiquitous, 
then  increasing  x  is  an  obvious  next  step.  However,  if  hypothesized 
transfer  from  x  to  y  is  not  occurring  in  typical  classrooms,  identi¬ 
fying  ways  to  facilitate  this  transfer  is  at  least  as  important  as 
teaching  skill  x.  Importantly,  experimental  studies  on  transfer 
between  conceptual  and  procedural  mathematical  knowledge  typ¬ 
ically  use  cleverly  designed  and  sequenced  problems  based  on 
theories  about  children’s  learning  and  cognition  to  maximize  the 
probability  of  transfer  between  conceptual  and  procedural  knowl¬ 
edge  (Rittle-Johnson  &  Schneider,  2015).  If  these  practices  are  not 
common  in  classrooms,  then  increasing  x  alone  may  be  insufficient 
to  increase  y;  rather,  researchers  and  practitioners  will  need  to 
focus  on  developing  and  implementing  pedagogy  and  curricula 
that  are  most  likely  to  result  in  transfer. 

There  are  several  reasons  to  believe  that  U.S.  children  might  not 
be  experiencing  the  full  potential  impacts  of  previous  fraction 
knowledge  of  one  type  on  later  fraction  learning  of  another  type. 
Most  troublingly,  children  may  not  see  a  link  between  fraction 
arithmetic  and  magnitudes.  Researchers  have  long  asserted  that 
children  memorize  fraction  arithmetic  procedures  without  under¬ 


standing  what  they  mean  (Cramer  &  Bezuk,  1991;  Hiebert  & 
Weame,  1986).  If  this  is  true,  and  magnitude  information  is  not 
accessed  during  fraction  arithmetic,  it  is  difficult  to  understand 
how  transfer  in  either  direction  would  occur.  Finally,  although  the 
link  between  children’s  representations  of  whole  number  magni¬ 
tudes  and  whole  number  arithmetic  is  well  established,  it  is  pos¬ 
sible  that  this  link  interferes  with  children’s  ability  to  use  fraction 
magnitudes  to  guide  their  understanding  of  fraction  arithmetic. 
Both  middle  school  students  and  preservice  teachers  performed 
lower  than  chance  on  fraction  multiplication  and  division  problems 
with  operands  of  magnitude  less  than  one,  in  which  they  were 
asked  to  answer  whether  the  answer  was  greater  than  or  less  than 
the  larger  operand  (Siegler  &  Lortie-Forgues,  2015).  Specifically, 
participants  usually  responded  that  the  answer  to  a  fraction  mul¬ 
tiplication  problem  was  greater  than  the  operands,  and  that  the 
answer  to  a  fraction  division  problem  was  less  than  the  operands, 
strategies  that  would  always  lead  to  correct  answers  on  whole 
number  arithmetic  problems  but  not  on  fraction  arithmetic  prob¬ 
lems.  This  misunderstanding  probably  reflects  the  difficulties  of 
using  information  about  fraction  magnitudes  to  inform  children’s 
limited  understanding  of  fraction  arithmetic  procedures. 

The  Present  Study 

The  goal  of  the  present  study  was  to  determine  how  much 
transfer  does  (rather  than  can )  occur  between  children’s  knowl¬ 
edge  of  fraction  arithmetic  and  fraction  magnitudes  during  the 
development  of  both  types  of  knowledge  between  4th  and  6th 
grades.  Our  study  addressed  previous  methodological  limitations 
(see  Limitations  of  Previous  Research,  above)  by  applying  a  state- 
trait  statistical  model  to  a  longitudinal  dataset  containing  multiple 
measures  of  children’s  fraction  arithmetic  skill  and  fraction  mag¬ 
nitude  understanding.  The  state-trait  model  (Steyer,  1987;  Steyer 
&  Schmitt,  1994)  addresses  the  problems  of  omitted  variables  and 
measurement  error  in  models  of  skill  development  by  partitioning 
the  variance  in  a  given  skill  into  trait  (factors  that  affect  mathe¬ 
matics  knowledge  similarly  across  development,  such  as  domain 
general  cognitive  abilities  and  socioeconomic  status)  and  state 
(effects  of  individual  differences  in  previous  mathematics  knowl¬ 
edge  on  subsequent  mathematics  knowledge)  effects,  with  mea¬ 
sured  skills  adjusted  for  measurement  error  at  each  time  point. 

Within  this  approach,  fraction  knowledge  (i.e.,  fraction  magni¬ 
tude  understanding  and  fraction  arithmetic  skill)  at  a  given  time 
point  is  influenced  by  (1)  a  trait-like  factor;  (2)  the  same  type  of 
fraction  knowledge  at  the  immediately  preceding  measurement 
occasion;  (3)  another  type  of  fraction  knowledge  at  the  immedi¬ 
ately  preceding  measurement  occasion;  and  (4)  unique  sources  of 
variation  (e.g.,  measurement  error;  Jackson,  Sher,  &  Wood,  2000). 
Trait  variation  is  likely  composed  of  factors  that  have  a  stable 
influence  on  a  particular  domain  of  children’s  mathematics  learn¬ 
ing  (fraction  arithmetic  or  fraction  magnitude  understanding,  in  the 
present  study)  throughout  development.  Examples  of  factors  hy¬ 
pothesized  to  influence  children’s  knowledge  of  a  mathematical 
domain  across  development  are  intelligence,  attention,  working 
memory,  reading  skills,  and  previous  knowledge  in  mathematics 
that  is  not  changing  during  this  developmental  period  (Deary, 
Strand,  Smith,  &  Fernandes,  2007;  Duncan  et  al.,  2007;  Geary, 
Hoard,  Nugent,  &  Bailey,  2013;  Jordan,  Kaplan,  Ramineni,  & 
Locuniak,  2009;  Szucs,  Devine,  Soltesz,  Nobes,  &  Gabriel,  2014; 
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Zentall,  2007).  State  effects,  in  contrast,  include  occasion-specific 
variability  in  children’s  fraction  knowledge,  which  may  be  influ¬ 
enced  by  instruction,  effects  of  specific  types  of  fraction  knowl¬ 
edge  on  the  development  of  other  types  of  fraction  knowledge,  or 
other  child  characteristics  changing  over  time.  State  effects  are  of 
theoretical  and  practical  interest,  because  they  represent  the  effects 
of  prior  skill  on  later  skill,  controlling  for  the  effects  of  stable 
factors  influencing  skill  across  development.  Experimentally  in¬ 
duced  effects  on  skill  are  also  independent  of  prior  ability,  if 
treatment  status  is  randomly  assigned.  Therefore,  it  is  plausible 
that  state-trait  models  produce  estimates  of  effects  in  skill  devel¬ 
opment  that  are  more  consistent  with  experimentally  induced 
effects  than  commonly  used  regression  models  (Bailey  et  al., 
2014). 

We  predicted  that  fraction  arithmetic  skill  and  fraction  magni¬ 
tude  understanding  would  be  highly  correlated  at  the  trait  level.  In 
other  words,  we  expected  a  great  deal  of  overlap  between  the 
general  cognitive  factors  that  help  children  to  learn  about  fraction 
magnitudes  and  the  factors  that  help  children  to  learn  about  frac¬ 
tion  arithmetic.  On  the  other  hand,  we  predicted  that  transfer 
between  knowledge  of  fraction  concepts  and  procedures,  as  esti¬ 
mated  by  cross-lagged  regression  paths  between  previous  fraction 
knowledge  of  one  type  and  subsequent  fraction  knowledge  of 
another  type,  would  be  limited,  given  U.S.  children’s  relatively 
shallow  understanding  of  fraction  arithmetic  (Siegler  &  Lortie- 
Forgues,  2015).  However,  we  hypothesized  that  the  development 
of  specific  types  of  fraction  knowledge  would  show  substantial 
state  effects,  estimated  by  regression  paths  between  previous  frac¬ 
tion  knowledge  of  one  type  and  subsequent  fraction  knowledge  of 
the  same  type.  That  is,  we  predicted  that  one  type  of  knowledge 
would  be  substantially  and  uniquely  associated  with  the  same  type 
of  knowledge  at  the  immediately  preceding  measurement  occa¬ 
sion,  controlling  for  the  stable  factors  influencing  mathematics 
learning  throughout  development.  Finally,  based  on  previous  anal¬ 
yses  of  individual  differences  in  children’s  fraction  knowledge 
(Hansen  et  al.,  2015;  Jordan  et  al.,  2013;  Vukovic  et  al.,  2014),  we 
predicted  that  the  estimated  latent  trait  factors  (i.e.,  factors  that 
have  a  stable  influence  on  children’s  math  learning  throughout 
development)  for  each  type  of  fraction  knowledge  would  be 
largely,  but  incompletely,  accounted  for  by  a  rigorous  set  of 
controls  including  domain  general  cognitive  abilities,  previous 
mathematics  knowledge,  reading  achievement,  and  socioeconomic 
status. 

If  transfer  between  children’s  fraction  arithmetic  skill  and  frac¬ 
tion  magnitude  understanding  is  ubiquitous,  then  more  early  in¬ 
struction  on  the  type  of  knowledge  that  shows  greater  transfer  may 
produce  long-term  learning  advantages  in  both  types  of  knowl¬ 
edge.  On  the  other  hand,  if  transfer  between  children’s  fraction 
arithmetic  skill  and  fraction  magnitude  understanding  is  possible 
but  does  not  easily  occur,  instruction  may  be  required  to  focus  on 
the  connections  between  these  types  of  knowledge  before  transfer 
can  occur. 


Method 


Participants 

Participants  were  recruited  from  nine  elementary  schools  in  two 
public  school  districts  in  the  Mid-Atlantic  United  States  as  a  part 


of  a  larger  longitudinal  investigation  of  mathematical  learning.  All 
third-grade  students  in  the  schools  were  sent  an  informed  consent 
letter  requesting  their  participation  in  the  study.  Informed-consent 
forms  were  received  for  517  third  grade  students;  however,  36 
students  opted  out  before  the  first  assessment.  In  fourth  grade  and 
again  in  fifth  grade,  the  same  letter  was  sent  out  to  replenish  the 
sample  ( n  =  27  in  fourth  grade;  n  =  28  in  fifth  grade).  In  total,  536 
students  participated  in  the  present  study.  Sample  demographics 
are  found  in  Table  1.  Beginning  in  4th  grade,  all  participants 
reportedly  were  taught  from  curricula  aligned  with  the  Common 
Core  State  Standards  (CCSS;  Council  of  Chief  State  School  Offi¬ 
cers  &  National  Governors  Association  Center  for  Best  Practices, 
2010). 

We  obtained  curriculum  pacing  guides  from  participating  school 
districts,  which  indicated  that  fraction  instruction  occurred  at  ap¬ 
proximately  the  same  time  for  all  participants:  in  early  spring  of 
fourth  grade,  mid  fifth  grade,  and  early  in  sixth  grade.  In  general, 
the  CCSS  indicate  that  fourth  grade  fraction  instruction  focuses  on 
fraction  equivalence  and  ordering  as  well  as  the  decomposition  of 
fractions  (e.g.,  3/8  =  1/8  +  1/8  +  1/8).  Students  also  are  asked  to 
add  and  subtract  fractions  and  mixed  numbers  with  like  denomi¬ 
nators.  In  fifth  grade,  students  learn  addition  and  subtraction  of 
fractions  with  unlike  denominators  as  well  as  early  multiplication 
and  division  of  fractions  (e.g.,  division  of  a  whole  number  by  a  unit 
fraction).  In  sixth  grade,  students  engage  in  more  advanced  mul¬ 
tiplication  and  division  of  fractions  (e.g.,  division  of  a  fraction  by 
a  fraction). 

Outcome  Measures 

Fraction  arithmetic.  The  fraction  arithmetic  measure  (adap¬ 
ted  from  Hecht,  1998)  included  12  paper-and-pencil  fraction  com¬ 
putation  items  involving  addition  and  subtraction  of  fractions 
with  like  and  unlike  denominators.  There  were  six  addition 
items  and  six  subtraction  items.  Eight  of  the  items  had  like 
denominators,  and  four  involved  mixed  numbers.  The  problems 
were:  2/5  +  1/5,  %  -  1/4,  3/6  +  1/6,  5/6  -  2/6,  1  3/4  -  1/4,  3/4  + 
2/4,  3  3/8  +  1  2/8,  2  2/3  -  1  1/3,  5/6  +  2/3,  7/8  -  1/2,  1  1/3  - 
4/5,  and  3/4  +  2/3.  Internal  reliability  for  the  current  sample 


Table  1 

Sample  Demographics  (n  =  530J) 


Characteristic 


Gender 

Male 

Female 

Race 

White 

Black 

Asian/Pacific  Island 
American  Indian/Alaskan  Native 
Hispanic 
Low  Income 
English  Learner 
Special  Education2 


47.0 

53.0 

51.8 

*  40.0 

5.7 
2.5 
17.7 

60.9 
10.6 
10.6 


'  Six  students  were  missing  demographic  data  due  to  attrition.  2  Special 

education  refers  to  students  receiving  special  education  services  in  third 
grade. 
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was  good  at  all  time  points  (a  >  .79  at  each  time  point,  with  an 
average  reliability  of  a  =  .87  across  time  points). 

Fraction  magnitudes.  Fraction  magnitude  understanding  was 
assessed  with  an  established  fraction  number  line  estimation  task 
(Siegler  et  al.,  201 1).  Children  estimated  where  fractions  should  go 
on  number  lines  ranging  from  zero  to  one  and  zero  to  two.  This 
task  was  administered  on  a  laptop  computer  using  DirectRT  v2012 
and  scored  electronically.  Each  fraction  notation  was  shown  below 
the  center  of  the  number  line.  Children  responded  by  moving  the 
cursor  to  where  they  estimated  the  fraction  was  located  on  the 
number  line,  and  pressing  a  key  to  indicate  their  answer.  After 
each  trial,  a  new  fraction  appeared,  and  the  process  repeated.  After 
each  item,  the  cursor  was  repositioned  to  “0”  on  the  number  line 
to  discourage  students  from  relying  on  their  response  to  the  pre¬ 
vious  item  as  an  anchor. 

To  introduce  students  to  the  task,  the  assessor  demonstrated 
where  a  fraction  (1/8)  should  go  on  the  0  to  1  number  line.  The 
assessor  then  asked  the  student  to  locate  a  practice  fraction  (1/4). 
No  feedback  was  given.  Students  estimated  the  positions  of  the 
following  fractions  on  the  0-1  number  line:  1/5,  13/14,  2/13,  3/7, 
5/8,  1/3,  1/2,  1/19,  and  5/6. 

Students  were  then  given  fractions  to  estimate  on  a  number  line 
ranging  from  zero  to  two.  The  assessor  modeled  two  fractions  (1/8 
and  1  1/8),  and  then  asked  the  student  to  place  a  practice  fraction 
(1/4)  on  the  number  line.  Students  were  asked  to  place  the  follow¬ 
ing  fractions  on  the  0-2  number  line:  1/3,  7/4,  12/13,  1  11/12,  3/2, 
5/6,  5/5,  1/2,  7/6,  1  2/4,  1,  3/8,  1  5/8,  2/3,  1  1/5,  7/9,  1/19,  1  5/6, 
and  4/3. 

Estimation  accuracy  was  assessed  using  percent  absolute  error 
(the  absolute  value  of  the  difference  between  the  estimated  posi¬ 
tion  and  actual  position,  divided  by  the  numerical  range  of  the 
number  line,  multiplied  by  one  hundred).  This  is  a  commonly  used 
method  for  operationalizing  children’s  estimation  accuracy  on 
number  line  estimation  tasks  (Booth  &  Siegler,  2006;  Opfer  & 
Siegler,  2007).  Internal  reliability  for  the  current  sample  was  high 
at  all  time  points  (a  >  .90  at  each  time  point,  with  an  average 
reliability  of  a  =  .96). 

Control  Measures 

Language.  The  Peabody  Picture  Vocabulary  Test  (PPVT; 
Dunn  &  Dunn,  2007)  was  used  to  assess  verbal  ability.  In  this  task, 
children  were  shown  four  pictures,  and  are  asked  to  point  to  the 
one  that  corresponds  to  a  word  spoken  by  the  assessor.  Internal 
reliability  of  this  standardized  measure  in  third  grade  is  high,  as 
reported  in  the  testing  manual  (a  >  .96;  Dunn  &  Dunn,  2007). 

Nonverbal  reasoning.  To  assess  nonverbal  ability,  the  Matrix 
Reasoning  subtest  of  the  Wechsler  Abbreviated  Scale  of  Intelli¬ 
gence  (WASI;  Wechsler,  1999)  was  administered.  Children  were 
presented  with  a  series  of  grids  with  pictures  in  three  out  of  four 
cells.  They  were  asked  to  choose  one  of  five  choices  to  complete 
the  pattern.  Internal  reliability  of  the  measure  is  high  in  third  grade, 
as  reported  in  the  testing  manual  (a  >  .90;  Wechsler,  1999). 

Calculation  fluency.  Whole  number  calculation  fluency  was 
assessed  using  the  Addition  Fluency  subtest  of  the  Wechsler 
Individual  Achievement  Test  (WIAT;  The  Psychological  Corpo¬ 
ration,  1992).  Children  were  given  one  minute  to  solve  48  addition 
problems.  Test-retest  reliability  in  third  grade  is  high  as  reported 
in  the  testing  manual  (.87;  The  Psychological  Corporation,  1992). 


Working  memory.  The  Counting  Recall  subtest  of  the  Work¬ 
ing  Memory  Test  Battery  for  Children  (WMTB-C;  Pickering  & 
Gathercole,  2001)  was  used  to  assess  working  memory.  In  this 
task,  children  were  asked  to  count  collections  of  between  four  and 
seven  red  dots  on  individual  cards,  and  then  to  recall  the  number 
of  dots  that  were  counted  on  each  card.  The  number  of  cards  in  a 
series  varied  with  the  span  level  and  ranged  from  one  to  seven. 
Passing  four  items  at  a  level  allowed  children  to  go  to  the  next 
level,  where  the  number  of  cards  that  must  be  remembered  in¬ 
creases  by  one.  Test-retest  reliability  in  third  graders,  as  reported 
in  the  testing  manual,  was  .61  (Pickering  &  Gathercole,  2001). 

Attention.  The  inattention  subscale  of  the  SWAN  Rating 
Scale  (Swanson  et  al.,  2006)  was  used  to  measure  inattention. 
Teachers  were  instructed  to  rate  children’s  attention  during  math 
classes  using  a  9-item  questionnaire  on  the  basis  of  the  criteria  for 
attention-deficit/hyperactivity  disorder  for  inattention  from  the  Di¬ 
agnostic  and  Statistical  Manual  of  Mental  Disorders,  4th  Edition 
(American  Psychiatric  Association,  1994).  Teachers  used  a  rating 
scale  of  1-7  (below  average  to  above  average  attention)  for  each 
item  to  rate  children’s  attention  relative  to  other  children  of  the 
same  age.  The  Inattentive  Behavior  subscale  had  high  internal 
consistency  for  this  sample  (a  =  .97). 

Reading  fluency.  The  Sight  Word  Efficiency  subtest  of  the 
Test  of  Word  Reading  Fluency  (TOWRE;  Torgesen,  Wagner,  & 
Rashotte,  1999)  was  used  to  assess  reading  fluency.  Students  were 
asked  to  read  as  many  words  aloud  from  a  list  as  they  could  in  45 
seconds.  In  third  grade,  test-retest  reliability  is  .97,  as  reported  in 
the  testing  manual  (Torgesen  et  al.,  1999). 

Income  status.  Income  status  was  determined  by  participation 
in  the  free/reduced  lunch  program  at  school.  This  information  was 
obtained  from  the  school  districts  at  the  start  of  the  study. 

Procedure 

The  fraction  arithmetic  and  fraction  magnitude  tasks  were  as¬ 
sessed  at  four  time  points:  (1)  spring  of  fourth  grade,  (2)  fall  of 
fifth  grade,  (3)  spring  of  fifth  grade,  and  (4)  winter  of  sixth  grade. 

For  the  fraction  arithmetic  skill  measure,  assessors  administered 
the  assessment  in  a  quiet,  one-on-one  setting  in  fourth  grade  and 
fall  of  fifth  grade.  This  assessment  was  given  in  a  whole-class 
setting  beginning  in  spring  of  fifth  grade.  Students  were  given  10 
min  to  complete  this  measure.  The  number  line  estimation  task  was 
administered  individually  at  each  time  point. 

Control  measures  were  administered  in  third  grade.  Language 
(PPVT),  attention  (SWAN),  nonverbal  reasoning  (WASI-Matrix 
Reasoning),  reading  fluency  (TOWRE),  and  calculation  fluency 
(WIAT-Addition)  were  assessed  during  the  winter  of  third  grade, 
and  working  memory  (WMBT-C-Counting  Recall)  was  assessed 
during  the  spring  of  third  grade.  All  control  measures,  with  two 
exceptions,  were  administered  individually  to  children.  The  calcu¬ 
lation  fluency  task  was  administered  in  a  whole-class  setting,  and 
the  attention  measure  was  a  survey  administered  to  teachers. 

Analyses 

All  models  were  estimated  in  Mplus  version  7.11  (Muthen  & 
Muthen,  1998-2012),  with  missing  data  estimated  using  full  in¬ 
formation  maximum  likelihood.  The  fraction  arithmetic  skill  and 
fraction  magnitude  understanding  data  were  positively  skewed  and 
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therefore  log  transformed,  then  standardized.  The  fraction  number 
line  estimation  score  was  then  multiplied  by  —1,  so  that  higher 
values  indicated  greater  accuracy,  to  improve  interpretation.  We 
first  estimated  a  separate  state-trait  model  for  each  type  of  fraction 
knowledge.  This  allowed  us  to  identify  appropriate  models  of  the 
development  of  children’s  fraction  magnitude  understanding  and 
fraction  arithmetic  skill,  separating  out  factors  contributing  to 
interindividual  variation  and  path-dependent,  intraindividual  vari¬ 
ation  (described  earlier  under  The  Present  Study).  Then,  we  esti¬ 
mated  the  effects  of  one  type  of  knowledge  on  subsequent  levels  of 
the  other  type  of  knowledge  by  combining  the  developmental 
models  for  both  types  of  knowledge  and  adding  cross-lagged  paths 
and  correlations  between  measures  at  the  same  time  point.  For  all 
models,  each  measure  is  adjusted  for  measurement  error  by  treat¬ 
ing  each  construct  at  each  wave  as  a  latent  variable  with  a  path  to 
the  measured  variable  set  to  the  square  root  of  the  measured 
variable’s  reliability. 

Results 

Untransformed  means  and  standard  deviations  of  all  measures 
appear  in  Table  2.  The  correlation  matrix  for  all  variables  included 
in  these  analyses  appears  in  Table  3.  First,  we  tested  whether 
individual  differences  in  fraction  arithmetic  skill  and  fraction  mag¬ 
nitude  understanding  appeared  to  develop  as  a  state-like  process 
(e.g.,  prior  knowledge  influences  later  learning),  a  trait-like  pro¬ 
cess  (e.g.,  stable  factors  that  influence  learning  of  a  specific  area  of 
mathematics  throughout  development  influence  later  knowledge), 
or  a  combination  of  the  two.  Comparing  the  interwave  correlations 
of  each  construct  is  a  useful  way  to  conceptualize  these  altema- 


Table  2 


Means  and  Standard  Deviations  of  Measures  for 
Study  Participants 


Measure 

Mean  (SD) 

Min.  Score 

Max.  Score 

Outcome  Measures 

Fraction  magnitude 

understanding  (PAE) 

Time  1  (spring  4th) 

19.15(9.33) 

1.81% 

39.15% 

Time  2  (fall  5th) 

19.16(10.46) 

2.25% 

43.03% 

Time  3  (spring  5th) 

15.75  (10.26) 

1.44% 

47.89% 

Time  4  (winter  6th) 

12.95  (16.75) 

1.66% 

39.12% 

Fraction  arithmetic  skill  (12 
items) 

Time  1  (Spring  4th) 

5.62  (3.09) 

.00 

8.00 

Time  2  (Fall  5th) 

5.81  (3.43) 

.00 

10.00 

Time  3  (Spring  5th) 

7.68  (2.60) 

.00 

12.00 

Time  4  (Winter  6th) 

8.19(2.65) 

.00 

12.00 

Predictor  Measures 

Calculation  fluency  (percentile)  51.42  (26.68) 

.10 

99.00 

Attention  (raw;  maximum 

score  of  63) 

36.76(12.01) 

9.00 

63.00 

Language  (percentile) 

47.16(28.63) 

1.00 

99.90 

Nonverbal  reasoning  (scaled; 

M  =  10) 

9.81  (3.26) 

2.00 

17.00 

Reading  fluency  (percentile) 

64.38  (24.74) 

.90 

99.00 

Working  memory  (percentile) 

31.45  (29.15) 

.10 

99.00 

Note.  PAE  refers  to  percent  absolute  error.  Fraction  arithmetic  skill  is 
reported  as  number  of  items  correct. 


tives.  Imagine  that  the  interwave  correlation  of  a  fractions  measure 
between  Time  1  and  Time  2  was  x.  If  only  “trait”  factors  influence 
children’s  fraction  understanding,  we  would  expect  the  correlation 
between  Time  2  knowledge  and  Time  3  knowledge  to  also  be  x,  a 
reflection  of  the  stable  characteristics  influencing  mathematics 
learning  across  time.  On  the  other  hand,  if  fraction  knowledge 
were  a  “state”  dependent  progress — that  is,  fraction  knowledge  is 
a  function  of  knowledge  at  the  immediately  preceding  wave  and 
error — then  the  correlation  between  Time  2  knowledge  and  Time 
3  knowledge  should  decay  as  the  distance  in  time  between  mea¬ 
surements  increases  and  transient  factors  other  than  previous 
knowledge  accumulate  in  their  influence  on  knowledge  over  time. 
More  precisely,  the  predicted  average  correlation  two  time  points 
away  would  be  the  effect  of  Time  1  on  Time  2  fraction  under¬ 
standing  times  the  effect  of  Time  2  on  Time  3  fraction  understand¬ 
ing,  or  x2- 

The  average  interwave  correlation  of  fraction  magnitude  under¬ 
standing  at  consecutive  time  points  was  .88.  The  observed  average 
interwave  correlation  between  measures  at  waves  two  time  points 
away  was  .80.  Similarly,  the  average  interwave  correlation  be¬ 
tween  measures  at  waves  three  time  points  away  was  .73  (in 
between  a  predicted  “state-only”  correlation  of  ,883,  or  .68,  or  a 
predicted  “trait-only”  correlation  of  .88).  The  correlations  between 
fraction  arithmetic  skill  scores  also  decayed  over  time,  but  re¬ 
mained  more  stable  than  would  be  predicted  by  a  purely  autore¬ 
gressive  model:  Average  interwave  correlations  for  this  measure  at 
waves  one,  two,  and  three  time  points  away  (with  corresponding 
values  predicted  by  a  purely  state  and  purely  trait  models)  were  .54 
(.54,  .54),  .40  (.29,  .54),  and  .33  (.16,  .54).  Overall,  both  outcomes 
showed  moderate  to  high  correlations  with  each  other  and  control 
variables,  as  expected. 

State-Trait  Models 

As  previously  shown,  there  were  high  correlations  between  the 
measure  of  fraction  magnitude  understanding  measured  at  differ¬ 
ent  waves.  Upon  further  investigation,  we  found  that  a  more 
parsimonious  state-only  model  fit  the  development  of  fraction 
magnitude  understanding  as  well  as  a  state-trait  model.  Based  on 
Kline’s  (2005)  guidelines  of  an  RMSEA  below  .08  and  a  CFI 
above  .90,  this  state-only  model  fit  the  data  acceptably  well 
(X  [3]  —  11.5,  p  =  .009;  RMSEA  =  .07;  CFT  =  .996),  indicating 
that  fraction  magnitude  understanding  develops  as  a  largely  path 
dependent  process  during  this  developmental  period.  In  other 
words,  fraction  magnitude  understanding  at  one  time  point  is 
highly  dependent  on  fraction  magnitude  understanding  at  a  prior 
time  point.  The  development  of  fraction  arithmetic  skill  was  best 
fit  by  a  state-trait  model,  which  fit  the  data  very  well  (x2[l]  =  .7, 
p  =  .39;  RMSEA  =  .00;  CFI  =  1.000),  indicating  that  fraction 
arithmetic  skill  develops  as  a  combination  of  stable  between-child 
differences  (perhaps  including  domain  general  abilities,  some 
types  of  prior  mathematics  achievement,  and  socioeconomic  sta¬ 
tus)  and  path  dependent  learning  during  this  developmental  period. 

To  estimate  the  effects  of  one  type  of  knowledge  on  subsequent 
levels  of  the  other  type  of  knowledge,  we  combined  the  models  for 
both  types  of  knowledge,  described  above,  and  added  bidirectional 
cross-lagged  paths  and  correlations  between  measures  at  the  same 
time  point  (see  Figure  1).  Because  a  trait  factor  was  included  for 
children  s  fraction  arithmetic  skill,  but  not  their  fraction  magnitude 


Table  3 

Correlation  Matrix  for  All  Variables 
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Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14  15 

1.  Fraction  magnitude 
understanding  (Tl) 

2.  Fraction  magnitude 

— 

understanding  (T2) 

3.  Fraction  magnitude 

.895** 

— 

understanding  (T3) 

4.  Fraction  magnitude 

.826** 

.872** 

— 

understanding  (T4) 

5.  Fraction  arithmetic 

.734** 

.779** 

.860** 

— 

skill  (Tl) 

6.  Fraction  arithmetic 

.374*’ 

.393** 

.423** 

.398** 

— 

skill  (T2) 

.408** 

.438** 

.468** 

.436** 

.645** 

_ 

7.  Fraction  arithmetic 
skill  (T3) 

8.  Fraction  arithmetic 

.325** 

.350** 

.401** 

.417** 

.410** 

.430** 

— 

skill  (T4) 

.341** 

.388** 

.405** 

.458** 

.329** 

.382** 

.544** 

_ 

9.  Low  income 

-.245** 

-.275** 

-.230** 

-.263** 

-.089 

-.108* 

-.121* 

-.092 

_ 

10.  Calculation  fluency 

.353** 

.403** 

.398** 

.405** 

.264** 

.277** 

.341” 

.323” 

-.138” 

_ 

11.  Attention 

.382** 

.406** 

.431** 

.440** 

.311** 

.369** 

.391” 

.319” 

-.266” 

.320** 

_ 

12.  Language 

13.  Nonverbal 

.472** 

.514** 

.477** 

.505** 

.272** 

.297** 

.297” 

.284** 

-.399” 

.187” 

.373” 

— 

reasoning 

.405** 

.424** 

.368** 

.443** 

.295** 

.322** 

.319” 

.210” 

-.277” 

.208” 

.412” 

.483” 

— 

14.  Working  memory 

.314** 

.309** 

.319** 

.295** 

.286** 

.298” 

.285** 

.186” 

-.110* 

.258” 

.340” 

.269” 

.355” 

— 

15.  Reading  fluency 

.301** 

.337** 

.337** 

.344** 

.217*’ 

.231” 

.379” 

.310” 

-.246” 

.353” 

.449” 

.343** 

.229” 

.217”  — 

Note.  T1  —  spring  4th  grade,  T2  —  fall  5th  grade,  T3  =  spring  5th  grade,  T4  =  winter  6th  grade.  “Low  income”  refers  to  students  receiving  free/reduced 
price  lunch.  Sections  of  the  correlation  matrix  containing  the  same  construct  assessed  at  different  timepoints  are  in  bold. 

*p  <  .05.  *>  <  .01. 


understanding,  the  model  has  features  of  both  a  state-trait  model 
and  a  cross-lagged  panel  model.  Trait  fraction  arithmetic  knowl¬ 
edge  and  fraction  magnitude  understanding  at  the  first  wave  were 
regressed  on  all  control  variables.  Thus  the  model  posits  that  prior 
abilities  and  knowledge  are  related  to  fraction  magnitude  under¬ 
standing  primarily  via  prior  fraction  magnitude  understanding  and 
fraction  arithmetic  knowledge,  while  they  are  directly  related  to 
fraction  arithmetic  at  each  wave.  As  displayed  in  Figure  1,  the 
model  fit  the  data  well. 

The  key  prediction  of  a  bidirectional  model  of  children’s  frac¬ 
tion  development  is  that  there  should  be  statistically  significant 
cross-lagged  paths  from  prior  fraction  arithmetic  knowledge  to 
later  fraction  magnitude  understanding  and  from  prior  fraction 
magnitude  understanding  to  later  fraction  arithmetic  knowledge. 
Findings  showed  some  evidence  for  this  pattern.  Specifically, 
children’s  previous  fraction  arithmetic  skill  significantly  predicted 
their  later  fraction  magnitude  understanding  between  Times  2  (fall 
of  fifth  grade)  and  3  (spring  of  fifth  grade)  and  between  Times  3 
(spring  of  fifth  grade)  and  4  (winter  of  sixth  grade),  and  children’s 
previous  fraction  magnitude  understanding  predicted  their  later 
fraction  arithmetic  skill  between  the  spring  of  fifth  grade  and  the 
winter  of  sixth  grade.  No  evidence  of  transfer  was  found  between 
Times  1  and  2,  before  children  received  instruction  (as  indicated 
by  the  schools’  curriculum  pacing  guide)  on  fraction  arithmetic  for 
operands  with  different  denominators.  Correlations  in  the  residuals 
of  both  types  of  fraction  knowledge  across  waves  showed  a  similar 
pattern  across  time,  with  the  largest  correlations  occurring  at  the 
last  two  waves.  This  indicates  that  fraction  magnitude  knowledge 
and  fraction  arithmetic  skill  changed  together  more  during  the  last 
two  waves  as  well. 


Control  variables,  on  which  Time  1  fraction  magnitude  under¬ 
standing  and  trait  fraction  arithmetic  skill  were  regressed,  ac¬ 
counted  for  50%  of  the  variance  in  trait  fraction  arithmetic  skill 
and  41%  of  the  variance  in  fraction  magnitude  understanding  at 
Time  1  (spring  of  4th  grade).  Importantly,  the  correlation  between 
trait  fraction  arithmetic  skill  and  Time  1  fraction  magnitude  un¬ 
derstanding  after  controlling  for  these  variables  was  only  .23,  and 
not  statistically  significant.  This  indicates  that  the  factors  other 
than  cognitive  skills,  reading  achievement,  and  previous  whole 
number  knowledge  influencing  fraction  magnitude  understanding 
and  fraction  arithmetic  skill  were  mostly  specific  to  each  type  of 
knowledge,  rather  than  shared  across  types.  This  amount  of  re¬ 
maining  specific  variance  was  substantial  (50%  of  the  variance  in 
trait  fraction  arithmetic  skill  and  59%  of  the  variance  in  fraction 
magnitude  understanding)  and  warrants  further  investigation. 

The  remaining  developmental  variability  of  children’s  fraction 
arithmetic  skill  was  characterized  by  variable  trait  and  state  path 
coefficients.  Trait  path  coefficients  ranged  from  .17  (Time  4; 
winter  6th  grade)  to  .94  (Time  3;  spring  5th  grade).  The  state  path 
coefficient  between  Time  2  (fall  5th  grade)  and  Time  3  was  not 
significant,  whereas  the  state  path  coefficients  between  the  other 
two  pairs  of  time  points  were  moderate  to  large.  Notably,  this 
interval  between  Time  2  and  Time  3  was  the  period  in  which 
children  received  the  most  effective  arithmetic  instruction  (see 
Table  2).  Therefore,  this  pattern  may  indicate  that  individual 
differences  in  children’s  responses  to  instruction  are  more  influ¬ 
enced  by  the  stable  factors,  such  as  domain  general  abilities  and 
relatively  stable  environmental  factors,  contributing  to  their  frac¬ 
tion  arithmetic  learning  throughout  development  than  by  their 
previous  knowledge  alone.  In  contrast,  learning  during  other  pe- 
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X2[49]  =  124.07,  p  <  .001 ;  RMSEA » .053;  CF1  =  .979 

Figure  1.  Model  of  the  codevelopment  of  children’s  fraction  Magnitude 
understanding  and  fraction  arithmetic  skill.  Numerical  suffixes:  1  =  spring 
4th  grade,  2  =  fall  5th  grade,  3  =  spring  5th  grade,  4  =  winter  6th  grade. 
All  paths  were  statistically  significant,  p  <  .01,  except  those  indicated  with 
ns  (p  >  .05)  or  t  (p  =  .06).  Residual  variances  for  each  latent  variable  are 
italicized.  Not  pictured:  Each  measure  is  adjusted  for  measurement  error  by 
treating  each  timepoint  as  a  latent  variable  with  a  path  to  the  measured 
variable  set  to  the  square  root  of  the  measured  variable’s  reliability;  latent 
trait  fraction  arithmetic  skill  (top)  and  fraction  magnitude  understanding  at 
the  first  timepoint  are  regressed  on  all  control  measures  listed  in  the 
Methods  section,  so  .23  is  the  correlation  between  residual  trait  fraction 
arithmetic  skill  and  residual  Time  1  fraction  magnitude  understanding. 

riods  may  be  influenced  at  least  as  much  by  prior  domain  knowl¬ 
edge  as  by  stable  factors  influencing  children’s  fraction  arithmetic 
learning  throughout  development. 

The  state  path  coefficients  estimated  in  the  development  of 
children’s  fraction  magnitude  understanding  were  very  large,  rang¬ 
ing  from  .83  to  .94,  consistent  with  the  interpretation  that  chil¬ 
dren’s  magnitude  understanding  depends  substantially  on  their 
previous  knowledge  of  fraction  magnitudes. 

Discussion 

The  present  study  examined  the  codevelopment  of  fraction 
arithmetic  skill  and  fraction  magnitude  understanding  using  a 
state-trait  modeling  approach.  Transfer  from  fraction  arithmetic 
skill  to  fraction  magnitude  understanding  occurred  between  the 
last  two  pairs  of  waves  of  measurement  (fall  fifth  grade  to  spring 
fifth  grade  and  spring  fifth  grade  to  winter  sixth  grade);  transfer 
from  fraction  magnitude  understanding  to  fraction  arithmetic  skill 
occurred  only  between  the  last  pair  of  waves  of  measurement 
(spring  fifth  grade  to  winter  sixth  grade),  after  children  received 
instruction  on  fraction  arithmetic  for  operands  with  different  de¬ 
nominators. 

Although  transfer  from  early  understanding  of  fraction  magni¬ 
tudes  (e.g.,  in  fourth  grade)  to  fraction  arithmetic  skill  might 
happen  under  some  circumstances,  we  did  not  find  evidence  that 


fraction  arithmetic  skill  plays  a  major  role  in  the  early  develop¬ 
ment  of  fraction  magnitude  understanding,  or  vice  versa.  One 
possible  explanation  for  little  transfer  early  in  development  and 
some  transfer  later  in  development  is  that,  because  a  large  number 
of  U.S.  children  in  middle  childhood  have  little  understanding  of 
fraction  magnitudes  and  memorize  fraction  arithmetic  procedures 
(Cramer  &  Bezuk,  1991;  Hiebert  &  Weame,  1986;  Siegler  et  al., 
2011),  they  do  not  initially  access  either  type  of  knowledge  while 
learning  the  other  type.  Limited  early  transfer  may  also  reflect  the 
nature  of  early  fraction  calculation — initially,  fourth  graders  add 

and  subtract  fractions  with  like  denominators,  which  requires  little 

\ 

understanding  and  a  limited  number  of  procedural  rules  (i.e., 
perform  operation  on  numerators,  but  not  denominators).  How¬ 
ever,  after  children  learn  more  about  fraction  arithmetic  and  frac¬ 
tion  magnitudes,  some  may  see  how  these  skills  support  each 
other,  which  makes  transfer  more  likely.  For  example,  students 
may  initially  learn  to  add  fractions  with  unlike  denominators  by 
memorizing  a  procedure;  later,  their  knowledge  of  fraction  mag¬ 
nitudes  may  help  them  visualize  the  equivalence  of  an  addend 
before  and  after  translating  it  into  a  fraction  with  a  denominator 
common  to  both  addends.  This  knowledge  may  also  help  children 
reject  implausible  answers  resulting  from  procedural  errors  in 
translating  addends.  In  other  words,  learning  about  fraction  mag¬ 
nitudes  may  help  children  make  sense  of  earlier  taught  procedures, 
and  practicing  fraction  arithmetic  procedures  may  help  children 
notice  connections  between  the  magnitudes  of  operands  and  an¬ 
swers.  This  process,  labeled  backward  reaching  transfer  (Perkins 
&  Salomon,  1988),  may  be  more  likely  than  forward  reaching 
transfer  in  children’s  fraction  development,  particularly  if  children 
are  not  explicitly  instructed  to  think  about  fraction  magnitudes 
while  they  are  initially  learning  about  fraction  arithmetic. 

Children’s  fraction  magnitude  understanding  and  fraction  arith¬ 
metic  skill  showed  substantial  state  dependent  effects  (see  Figure 
1);  that  is,  the  model  predicts  that  experimentally  induced  changes 
in  these  skills  would  have  persistent  effects  during  the  develop¬ 
mental  period  covered  by  the  current  study.  Indeed,  it  was  not  even 
necessary  to  include  a  factor  representing  relatively  stable  skills 
and  environments  influencing  children’s  learning  of  fraction  mag¬ 
nitudes  independently  at  different  times  throughout  development 
in  the  model  of  the  development  of  children’s  fraction  magnitude 
understanding.  These  path  coefficients  from  previous  knowledge 
to  later  knowledge  were  generally  larger  than  the  state  path  coef¬ 
ficients  reported  by  Bailey  and  colleagues  (2014)  in  their  models 
of  the  development  of  children’s  general  mathematics  achievement 
(i.e.,  Bailey  and  colleagues  found  state  path  coefficients  ranging 
from  .09  to  .34).  Thus,  our  model  predicts  that  early  instructional 
support  that  directly  targets  children’s  fraction  arithmetic  skills 
and  magnitude  understanding  is  likely  to  produce  more  persistent 
effects  on  these  same  specific  skills  than  early  support  that  targets 
mathematics  achievement  more  broadly  will  produce  on  general 
mathematics  achievement  measures.  Perhaps  the  different  pattern 
of  state  and  trait  effects  observed  in  the  present  study,  compared  to 
Bailey  et  al.  (2014),  is  attributable  to  the  specificity  of  the  mea¬ 
sured  skills:  mathematics  achievement  is  a  very  general  construct, 
which  consists  of  an  essentially  limitless  amount  of  knowledge, 
whereas  fraction  arithmetic  skill  and  fraction  magnitude  under¬ 
standing  can  be  mastered  with  a  limited  amount  of  knowledge. 
Consistent  with  this  hypothesis,  state  effects  were  larger  for  frac¬ 
tion  magnitude  understanding  than  for  fraction  arithmetic  skill,  the 
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latter  of  which  is  a  more  heterogeneous  set  of  knowledge  than  the 
former.  Therefore,  time-specific  changes  in  a  child’s  relative  level 
of  specific  skills  are  likely  to  be  more  persistent  than  time-specific 
changes  in  a  child’s  relative  level  of  general  mathematics  achieve¬ 
ment. 

Limitations 

The  results  from  the  present  study  should  be  interpreted  cau¬ 
tiously  because  of  several  limitations.  First,  participants  were  more 
socioeconomically  disadvantaged  than  the  national  average,  which 
limits  this  study’s  generalizability.  This  study  also  did  not  explic¬ 
itly  address  the  influence  of  classroom  instruction.  Therefore,  we 
cannot  make  claims  about  the  specific  instructional  processes  that 
affect  children’s  development  fraction  knowledge.  If  more  refer¬ 
ences  to  fraction  magnitudes  were  included  in  the  teaching  of  early 
fraction  arithmetic  procedures,  transfer  from  fraction  magnitude 
understanding  to  fraction  arithmetic  skill  might  increase.  On  the 
other  hand,  in  describing  the  development  of  children’s  fraction 
learning,  our  findings  make  predictions  about  the  likely  effects  of 
changing  specific  aspects  of  mathematics  curricula,  leaving  other 
things  the  same.  Primarily,  between  fourth  and  sixth  grades,  mar¬ 
ginal  gains  in  children’s  fraction  magnitude  understanding  alone 
are  not  likely  to  have  a  major  effect  on  the  development  of 
children’s  fraction  arithmetic  skill.  Perhaps  focusing  on  interven¬ 
tions  that  integrate  instruction  on  fraction  magnitude  understand¬ 
ing  and  fraction  arithmetic  (e.g.,  Bottge  et  al.,  2014;  Fuchs  et  al., 
2013,  2014)  will  yield  the  greatest  boosts  in  children’s  fraction 
knowledge. 

Finally,  we  only  examined  one  aspect  of  children’s  fraction 
conceptual  understanding.  Perhaps  fraction  arithmetic  knowledge 
has  stronger  causal  relations  with  fraction  concepts  other  than 
children’s  understanding  of  the  magnitudes  of  single  fractions, 
such  as  children’s  understanding  of  the  relational  nature  between 
the  numerator  and  the  denominator  (DeWolf,  Bassok,  &  Holyoak, 
2015)  and  how  operands  and  answers  are  conceptually  related  for 
different  fraction  arithmetic  operations  (Siegler  &  Lortie-Forgues, 
2015). 

Directions  for  Future  Research 

Given  the  small  transfer  effects  observed  in  the  current  study,  a 
better  understanding  of  classroom  practice  related  to  instruction  on 
fraction  arithmetic  and  magnitudes  is  needed.  Although  analysis  of 
classroom  instruction  was  outside  the  scope  of  the  current  study, 
future  research  might  collect  information  about  fraction  instruction 
and  classroom  practices  through  observations  to  identify  teaching 
practices  and  activities  that  might  be  especially  likely  to  facilitate 
transfer  between  fraction  magnitude  understanding  and  fraction 
arithmetic  skills. 

The  control  variables  in  the  current  study  accounted  for  approx¬ 
imately  half  of  the  variance  in  the  factors  that  influence  children’s 
fraction  arithmetic  skill  similarly  across  development,  and  the 
correlation  between  these  factors  and  children’s  fraction  magni¬ 
tude  understanding  after  adjusting  for  these  controls  was  small. 
This  raises  an  important  question:  What  other  factors  have  specific 
influences  on  children’s  learning  of  fraction  arithmetic  skill  or 
magnitude  understanding  during  this  time  period?  One  possibility 
is  that  these  types  of  knowledge  are  influenced  by  relatively  stable 


previous  levels  of  the  same  type  of  knowledge,  perhaps  caused  by 
difficult-to-measure  contextual  factors  in  the  classroom.  Further¬ 
more,  in  the  present  study,  the  predicted  effects  of  children’s  initial 
fraction  magnitude  understanding  on  their  later  fraction  magnitude 
understanding  and  the  predicted  effects  of  children’s  trait  fraction 
arithmetic  skill  on  their  fraction  arithmetic  skill  were  generally 
large  throughout  development.  Therefore,  identifying  factors  that 
contribute  to  children’s  initial  fraction  magnitude  understanding 
and  fraction  arithmetic  skill  throughout  development  may  provide 
important  insights  into  the  timing  and  targets  of  early  interventions 
designed  to  boost  children’s  fraction  knowledge. 

Conclusion 

In  summary,  transfer  from  fraction  arithmetic  to  fraction  mag¬ 
nitude  understanding  and  vice  versa  during  the  development  of 
children’s  fraction  knowledge  appears  to  happen  later  during  their 
learning  rather  than  earlier,  and  is  not  nearly  as  substantial  as  the 
effects  of  previous  knowledge  on  later  knowledge  of  the  same 
type.  However,  as  noted  above,  we  do  not  intend  to  argue  that 
fraction  magnitude  understanding  cannot  facilitate  the  learning  of 
fraction  procedures  early  in  instruction.  The  theorized  processes 
through  which  this  might  occur  are  plausible  and  are  supported  by 
a  broad  literature  on  transfer  between  knowledge  of  concepts  and 
procedures  in  mathematics,  which  includes  many  experimental 
studies  (Rittle-Johnson  &  Schneider,  2015).  Most  importantly,  an 
intensive  intervention  primarily  based  on  fraction  magnitude  un¬ 
derstanding  produced  substantial  gains  relative  to  a  control  group, 
even  in  fraction  arithmetic  skill  (Fuchs  et  al.,  2013),  although  as 
noted  previously,  children  also  received  explicit  fraction  arithmetic 
instruction  during  this  intervention.  Furthermore,  experimental 
investigation  should  examine  efficient  ways  to  teach  children’s 
fraction  magnitude  understanding  and  fraction  arithmetic  together 
(Bottge  et  al.,  2014;  Fuchs  et  al.,  2013,  2014). 
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Arithmetic  word-problem  solving  is  an  important  component  of  elementary  mathematics  curricula  that 
links  school  mathematics  to  real-life  problem  solving.  The  present  3-year  longitudinal  study  examined 
children’s  arithmetic  word-problem  solving  through  understanding  its  2  component  processes:  number- 
sentence  construction  and  computation.  Chinese  first  graders  ( n  =  153)  were  tested  on  their  arithmetic 
word-problem  solving,  in  which  they  wrote  down  the  number  sentences  before  they  solved  the  problems. 
They  were  also  given  a  parallel  test  of  arithmetic  computation.  Various  cognitive  predictors  and 
mathematical  outcomes  were  assessed.  It  was  found  that  the  children’s  difficulty  in  solving  arithmetic 
word  problems  lay  more  with  writing  number  sentences  rather  than  in  computation.  The  results  from  path 
analysis  showed  that  word  reading  and  various  numerical-magnitude  processing  and  domain-general 
skills  significantly  predicted  arithmetic  computation  whereas  only  domain-general  skills  significantly 
predicted  number-sentence  construction.  Both  number- sentence  construction  and  computation  signifi¬ 
cantly  predicted  future  arithmetic  computation  and  mathematics  achievement  even  after  controlling  for 
previous  arithmetic  computation.  Theoretical  and  practical  implications  are  discussed. 

Keywords:  arithmetic  word-problem  solving,  number-sentence  construction,  computation 


Mathematics  is  undoubtedly  one  of  the  fundamental  skills  that 
children  need  to  acquire  during  their  school  years.  Proficiency  in 
mathematics  is  related  to  both  financial  (Ritchie  &  Bates,  2013) 
and  psychological  (Parsons  &  Bynner,  2005)  well-being.  Previous 
studies  tend  to  focus  on  two  particular  kinds  of  mathematical 
skills:  arithmetic  computation  and  arithmetic  word-problem  solv¬ 
ing.  Both  have  received  substantial  attention  from  the  psycholog¬ 
ical  literature  (e.g.,  Fuchs  et  al„  2010,  2014;  Traff,  2013;  Zhang  & 
Lin,  2015).  Arithmetic  computation  and  arithmetic  word-problem 
solving  are  similar  in  that  they  both  require  the  problem  solver  to 
solve  an  arithmetic  problem.  However,  number  sentences  (e.g., 
8  +  7  =)  are  provided  in  arithmetic  computation  whereas  the 
problem  solvers  need  to  construct  their  own  number  sentences  in 
arithmetic  word-problem  solving.  This  additional  process  of  con¬ 
structing  number  sentences  seems  to  cause  difficulty  to  problem 
solvers  (Lewis  &  Mayer,  1987).  Furthermore,  arithmetic  word- 
problem  solving,  compared  with  arithmetic  computation,  has  been 
found  to  be  more  strongly  related  to  more  advanced  mathematical 
knowledge  (Fuchs  et  al.,  2014).  Because  arithmetic  computation 
and  arithmetic  word-problem  solving  differ  mainly  through  the 
presence  or  absence  of  a  number-sentence  construction  process, 
this  additional  process  seems  to  be  an  important  factor  in  the 
aforementioned  relation.  As  a  result  of  these  considerations,  the 
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current  study  examined  children’s  arithmetic  word-problem  solv¬ 
ing  by  investigating  individual  differences  in  the  number-sentence 
construction  process. 

Arithmetic  Word-Problem  Solving 

Research  on  arithmetic  word-problem  solving  has  some  history. 
Many  early  studies  were  conducted  to  examine  how  children  and 
adults  solve  arithmetic  word  problems,  and  researchers  were  par¬ 
ticularly  interested  in  the  cognitive  processes  and  mental  repre¬ 
sentations  involved  in  solving  arithmetic  word  problems.  For  ex¬ 
ample,  Riley  and  Greeno  (1988)  proposed  cognitive  models  that 
explain  how  the  position  of  the  unknown  affects  the  difficulty  of 
word  problems.  In  general,  start-unknown  problems  (e.g.,  the 
problem  “After  getting  5  candies,  Susan  now  has  8  candies.  How 
many  candies  did  Susan  have  originally?”  can  be  converted  into 
x  +  ^  =  8,  in  which  the  starting  value  is  unknown)  are  more 
difficult  because  problem  solvers  must  transform  the  problem  into 
a  part-whole  relation  before  they  can  decipher  the  problem.  On  the 
other  hand,  Mayer  and  Hegarty  (1996)  put  forth  that  the  arithmetic 
word-problem  solving  process  can  be  divided  into  four  stages: 
translation,  integration,  planning,  and  execution.  During  the  first 
stage,  translation,  the  problem  solver  constructs  a  mental  repre¬ 
sentation  of  the  situation  as  described  hn  the  problem.  Afterward, 
at  the  integration  stage,  the  problem  solver  relates  the  different 
pieces  of  information  presented.  Next,  the  problem  solver  devises 
a  plan  (e.g.,  choosing  the  right  operations,  putting  the  numbers  into 
the  equation)  to  solve  the  problem  at  the  planning  stage.  Finally, 
with  the  execution  stage,  the  problem  solver  solves  the  problem  by 
performing  the  required  computations  as  formulated.  Let  us  take 
the  following  word  problem  as  an  example: 
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Tom  has  $8.  He  has  $2  more  than  Sam  does.  How  much  does  Sam 
have?” 

To  solve  this,  the  problem  solver  first  needs  to  form  a  mental 
representation  of  Tom  and  Sam,  and  that  Tom  has  $8  and  Sam  has 
some  money  (translation).  They  then  need  to  relate  the  amounts 
possessed  by  Tom  and  Sam  (integration,  e.g.,  Tom’s  amount  > 
Sam  s)  and  put  together  a  plan  to  solve  the  problem  (planning,  8 
—  2  =  ?).  The  solver  then  performs  the  required  computation 
(execution,  8-2  =  6). 

On  the  basis  of  the  four-stage  model,  Hegarty  and  colleagues 
(Hegarty,  Mayer,  &  Green,  1992;  Hegarty,  Mayer,  &  Monk,  1995) 
identified  two  major  approaches  in  arithmetic  word-problem  solv¬ 
ing,  especially  when  solving  inconsistent  problems  (problems  that 
contain  the  word  more  but  require  the  use  of  subtraction).  The  two 
approaches  seem  to  undergo  similar  processes  during  the  transla¬ 
tion  stage,  reflected  by  the  similar  amount  of  time  spent  on  the 
initial  reading  of  the  problem  (Hegarty  et  al„  1992).  Their  differ¬ 
ences  lie  mainly  with  the  integration  and  planning  stages.  In  the 
direct  translation  approach,  the  problem  solvers  directly  translate 
the  key  relational  terms  (e.g.,  more,  less)  into  the  corresponding 
arithmetic  operation  (i.e.,  more  translated  into  addition).  They  pay 
more  attention  to  the  numbers  and  relational  terms  than  to  the 
variable  names,  suggesting  that  they  directly  translate  these  num¬ 
bers  and  relational  terms  into  number  sentences  (Hegarty  et  al., 
1992,  1995).  Possibly  because  of  an  incomplete  understanding  of 
the  problems,  these  problem  solvers  perform  worse  in  the  task. 
Other  problem  solvers  use  the  problem  model  approach.  They  first 
construct  a  mental  representation  of  the  situation  described  in  the 
problem  before  they  devise  a  plan  to  solve  it.  These  problem 
solvers  spend  more  time  in  the  integration  and  planning  stages,  and 
they  spend  a  more  balanced  amount  of  time  on  variable  names, 
numbers,  and  relational  terms.  This  suggests  that  they  actively 
construct  the  problem  situation  in  their  mind  before  conceiving  a 
plan  to  answer,  which  may  explain  their  better  performance  in 
arithmetic  word-problem  tasks  (Hegarty  et  al.,  1992,  1995). 

However,  the  focus  of  the  research  on  arithmetic  word-problem 
solving  has  shifted  in  recent  decades.  Many  of  the  recent  studies  on 
arithmetic  word-problem  solving  focus  on  the  identification  of 
correlates.  In  particular,  the  relations  among  working  memory, 
word  reading,  and  arithmetic  word-problem  solving  have  received 
a  lot  of  attention.  Both  word  reading  (Bjork  &  Bowyer-Crane, 
2013;  Fuchs  et  al.,  2006)  and  working  memory  (Andersson,  2007; 
Kyttala,  Aunio,  Lepola,  &  Hautamaki,  2014;  Lee,  Ng,  &  Ng,  2009; 
Swanson,  Jerman,  &  Zheng,  2008)  have  been  found  to  predict 
people’s  arithmetic  word-problem  solving  accuracy.  These  find¬ 
ings  have  informed  us  about  the  important  correlates  involved  in 
word-problem  solving. 

Although  there  is  a  substantial  amount  of  literature  on  both  the 
processes  and  correlates  of  word-problem  solving,  integration  of 
the  aforementioned  approaches  is  seldom  observed.  There  are  both 
strengths  and  limitations  of  each  approach,  and  integration  of 
various  approaches  may  provide  a  more  comprehensive  under¬ 
standing.  For  instance,  although  the  process  approach  has  shed 
light  on  the  processes  and  mental  representations  pertaining  to 
solving  arithmetic  word  problems,  it  has  uncovered  little  informa¬ 
tion  on  the  cognitive  skills  related  to  problem-solving  processes. 
However,  studies  investigating  correlates  of  arithmetic  word- 
problem  solving  usually  examined  only  the  outcomes.  Because 


there  are  various  component  processes  in  arithmetic  word-problem 
solving,  it  is  not  yet  apparent  how  these  correlates  are  related  to 
each  of  the  component  processes.  Such  insights  would  allow 
educators  to  devote  more  focused  attention  to  the  particular  com¬ 
ponent  process  children  find  difficult,  hence  improving  the  effi¬ 
ciency  of  the  intervention.  Therefore,  the  current  study  aimed  to 
examine  the  correlates  of  the  component  processes  in  arithmetic 
word-problem  solving. 

We  divided  the  whole  arithmetic  word-problem  solving  process 
into  two  major  component  processes:  number-sentence  construc¬ 
tion  and  arithmetic  computation.  Number-sentence  construction 
refers  to  the  process  by  which  the  problem  solvers  write  down 
number  sentences  based  on  the  problem  (i.e.,  writing  8  -  6  =  in  the 
previous  example).  It  corresponds  to  the  first  three  stages  in 
arithmetic  word-problem  solving  as  suggested  by  Mayer  and  He¬ 
garty  (1996).  We  did  not  further  divide  the  process  into  subpro¬ 
cesses  because  (a)  previous  studies  indicated  there  were  little 
individual  differences  during  the  translation  stage  because  superior 
and  inferior  problem  solvers  spend  similar  amounts  of  time  at  this 
stage  (Hegarty  et  al.,  1992)  and  that  errors  based  on  the  inability  to 
understand  the  problems  are  minimal  (Lee  et  al.,  2009;  Pape, 
2003);  (b)  the  integration  and  planning  stages  are  usually  incor¬ 
porated  in  an  iterative  manner  and  are  thus  inseparable  (Hegarty  et 
al.,  1992,  1995);  and  (c)  number  sentences  are  required  when 
children  solve  arithmetic  word  problems  in  local  settings,1  but  the 
products  of  the  intermediate  subprocesses  are  not.  However,  arith¬ 
metic  computation  refers  to  the  execution  stage  in  Mayer  and 
Hegarty ’s  (1996)  model,  and  it  was  assessed  by  asking  the  partic¬ 
ipants  to  solve  an  arithmetic  problem  with  the  number  sentences 
given  (e.g.,  solve  77  -  49  =).  As  a  result  of  the  products  of  the  two 
component  processes  (i.e.,  the  number  sentences  and  the  compu¬ 
tation  answers,  measured  by  the  performance  accuracy  of  the  two 
component  tasks  in  this  study),  inferences  can  be  made  about  the 
relevant  component  processes  (i.e.,  construction  of  mental  repre¬ 
sentation  and  computation). 

Cognitive  Predictors  of  Arithmetic 
Word-Problem  Solving 

As  mentioned,  various  studies  have  been  conducted  to  deter¬ 
mine  the  correlates  of  arithmetic  word-problem  solving.  These 
investigations  have  focused  on  domain-general  cognitive  skills 
(cognitive  skills  that  are  related  to  various  domains  of  achieve¬ 
ment;  e.g.,  intelligence  and  working  memory)  and  word-reading 
skills.  These  factors  are  important  for  word-problem  solving  be¬ 
cause  problem  solvers  need  to  read  the  problems  (requiring  word¬ 
reading  skills),  store  the  read  information  in  memory  for  further 
processing  (requiring  working  memory),  and  exercise  their  reason¬ 
ing  skills  to  figure  out  the  relations  among  variables  within  the 
problems  (requiring  nonverbal  intelligence;  verbal  intelligence, 
which  is  indicated  by  the  breadth  and  depth  of  one’s  verbal 
knowledge,  is  less  relevant  here).  The  link  between  these  cognitive 


1  An  analysis  of  local  first-grade  mathematics  textbooks  revealed  that 
most  of  the  word  problems  required  either  both  number  sentences  and 
answers  (65%)  or  only  answers  (30%).  Only  a  small  portion  of  the  items 
(5%)  necessitated  participants  to  solve  the  problems  through  other  formats 
(e.g.,  drawing,  using  manipulatives).  Therefore,  the  number  sentence  is 
considered  the  major  way  in  which  the  participants  demonstrate  their 
thinking  in  arithmetic  word-problem  solving. 
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capacities  and  arithmetic  word-problem  solving  has  already  been 
supported  by  previous  findings  (e.g.,  word  reading  [Bjork  & 
Bowyer-Crane,  2013;  Fuchs  et  al.,  2006];  working  memory  [An- 
dersson,  2007;  Swanson,  2011;  Swanson  et  al.,  2008];  and  non¬ 
verbal  intelligence  [Andersson,  2007;  Jogi  &  Kikas,  2015]). 

In  addition  to  domain-general  cognitive  skills  and  reading  skills, 
children’s  processing  of  numerical  magnitude,  or  the  capacity  for 
representing  the  magnitude  of  information  in  numerical  stimuli 
(e.g.,  comparing  the  numerical  magnitudes  of  Arabic  numerals  and 
converting  symbolic  numerical  information  into  the  corresponding 
nonsymbolic  forms),  has  received  much  attention  from  the  math¬ 
ematics  learning  literature,  with  the  understanding  that  our  basic 
numerical-magnitude  processing  serves  as  the  foundation  for 
higher  numerical  and  mathematical  skills  (De  Smedt,  Noel,  Gilmore, 
&  Ansari,  2013;  Dehaene,  2009).  However,  researchers  tend  to  dis¬ 
agree  on  the  form  of  numerical-magnitude  representation  that  forms 
the  basis  of  our  mathematical  skills.  Several  researchers  have  sug¬ 
gested  that  nonsymbolic  numerical-magnitude  processing  (the  innate 
ability  to  represent  and  discriminate  numerosities,  usually  measured 
by  dot  comparison  tasks)  serves  as  the  foundation  of  our  mathematical 
skills  (e.g.,  Halberda,  Mazzocco,  &  Feigenson,  2008;  Piazza  et  al., 
2010)  whereas  others  tend  to  think  that  it  is  the  ability  to  map 
symbolic  numbers  onto  nonsymbolic  numerosity  representations 
(usually  measured  by  mapping  tasks  between  nonsymbolic  numeros¬ 
ity  and  symbolic  numerals)  and  our  symbolic  numerical  processing 
skills  (usually  assessed  by  numeral  comparison  tasks)  that  matter 
(e.g.,  De  Smedt  et  al.,  2013;  Mundy  &  Gilmore,  2009;  Sasanguie, 
Gobel,  Moll,  Smets,  &  Reynvoet,  2013).  Although  direct  evidence  for 
the  connection  between  arithmetic  word-problem  solving  and 
numerical-magnitude  representation  is  lacking,  one  previous  work 
found  that  people  do  in  fact  activate  their  innate  numerical-magnitude 
representation  when  they  represent  arithmetic  word  problems  (Orran- 
tia  &  Munez,  2013).  Numerical-magnitude  representation  is  also 
found  to  be  significantly  and  positively  correlated  with  children’s 
arithmetic  computation  performance  at  a  moderate  magnitude  (aver¬ 
age  r  =  .281;  Schneider  et  al.,  2016).  Overall,  these  findings  suggest 
a  correlation  between  arithmetic  word-problem  solving  and 
numerical-magnitude  processing.  However,  what  remains  unknown  is 
the  particular  type  of  numerical-magnitude  processing  that  drives  the 
relation.  Therefore,  these  numerical-magnitude  processing  variables 
were  included  in  the  current  study  to  address  this  issue. 

Although  the  relations  between  arithmetic  word-problem  solv¬ 
ing  and  various  correlates  have  been  directly  or  indirectly  demon¬ 
strated,  the  exact  roles  of  these  correlates  in  word-problem  solving 
processes  remains  to  be  explored  because  most  studies  that  sought 
to  identify  such  correlates  focused  strictly  on  the  answers  to  the 
word  problems,  not  on  the  component  processes.  Whether  the 
correlates  actually  predict  the  number-sentence  construction  pro¬ 
cess  or  the  computation  process  has  not  been  evaluated.  By  sep¬ 
arating  the  two  component  processes  and  including  a  more  com¬ 
plete  set  of  potential  cognitive  predictors,  the  current  study  allows 
us  to  better  understand  the  roles  of  different  cognitive  capacities  in 
word-problem  solving  processes.  Because  number-sentence  con¬ 
struction  involves  understanding  the  scenario  presented  in  the 
problems,  it  seems  to  require  more  word-reading  skills,  reasoning 
skills,  and  working  memory  capacity.  Therefore,  it  is  expected  that 
these  features  are  more  strongly  related  to  number-sentence  con¬ 
struction  than  to  arithmetic  computation.  However,  numerical- 
magnitude  processing  capacities  might  be  more  strongly  related  to 


arithmetic  computation  than  to  number-sentence  construction  be¬ 
cause  whereas  the  representation  of  numerical  magnitude  seems  to 
be  an  inherent  requirement  of  arithmetic  computation  (one  needs 
to  know  how  much  are  8  and  9  before  they  can  figure  out  the 
answer  to  8  +  9),  it  is  the  way  in  which  the  numbers  are  arranged 
in  the  number  sentences,  instead  of  the  numerical  magnitudes  of 
the  numbers  themselves,  that  matters  to  the  number-sentence  con¬ 
struction  process.  The  difficulty  of  the  number-sentence  construc¬ 
tion  process  continues  to  be  the  same  even  if  the  numbers  involved 
are  changed. 

\ 

Relative  Contribution  of  Number-Sentence 
Construction  Versus  Arithmetic  Computation  to 
Future  Mathematics  Achievement 

The  division  of  the  arithmetic  word-problem  solving  process 
into  the  two  component  processes  (number-sentence  construction 
and  computation)  allows  for  examining  the  relative  contributions 
of  the  two  component  processes  to  the  outcome  of  arithmetic 
word-problem  solving.  As  mentioned  already,  findings  from  pre¬ 
vious  studies  have  suggested  that  although  both  component  pro¬ 
cesses  are  essential  for  solving  arithmetic  word  problems  (Hegarty 
et  al.,  1992;  Lewis  &  Mayer,  1987),  errors  in  word-problem 
solving  seem  to  be  brought  about  mainly  by  the  use  of  wrong 
number  sentences  (Lewis  &  Mayer,  1987;  Muth,  1991).  However, 
these  studies  looked  at  arithmetic  word-problem  solving  in  middle 
school  students  and  undergraduates,  both  of  who  should  be  very 
familiar  with  basic  arithmetic  computations.  This  may  downplay 
the  significance  of  arithmetic  computation  in  word-problem  solv¬ 
ing.  By  using  grade-appropriate  items,  we  can  more  clearly  estab¬ 
lish  the  relative  contributions  of  the  two  component  processes  to 
arithmetic  word-problem  solving. 

Of  even  greater  interest  are  the  effects  of  these  two  component 
processes  on  future  arithmetic  computation  performance  and  gen¬ 
eral  mathematics  achievement.  Arithmetic  computation  is  funda¬ 
mental  for  building  advanced  mathematical  knowledge,  such  as 
algebra  (Lee,  Ng,  Bull,  Pe,  &  Ho,  2011)  and  fraction  concepts 
(Vukovic  et  al.,  2014).  However,  the  contribution  of  number- 
sentence  construction  skills  to  other  mathematical  tasks  has  rarely 
been  examined.  It  is  reasonable  to  suggest  that  number-sentence 
construction  skills  should  be  at  least  equally  important  for  solving 
other  mathematical  problems.  Compared  with  arithmetic  compu¬ 
tation,  number-sentence  construction  is  a  relatively  ill-defined 
task.  To  construct  the  right  number  sentences,  a  learner  needs  to 
locate  the  relevant  information  from  the  given  text,  recognize  the 
question,  and  recognize  the  correct  operations.  Successful  word- 
problem  solvers  need  to  be  flexible  to  avoid  being  misled  by  the 
superficial  features  of  a  word  problem  (e.g.,  less  than  does  not 
automatically  mean  subtraction),  use  their  reasoning  skills  to  un¬ 
derstand  the  interrelations  among  variables,  and  demonstrate  a 
solid  conceptual  understanding  of  the  arithmetic  operations  so  that 
they  can  determine  the  right  operations  to  be  used.  These  qualities 
are  also  required  in  other  domains  of  mathematics  learning,  such  as 
statistics  (identifying  relevant  information  from  figures  and  ta¬ 
bles),  measurement  (knowing  when  to  use  a  particular  operation  or 
equation  to  calculate  area  or  volume),  and  algebra  (performing  the 
reverse  of  arithmetic  operations,  requiring  understanding  of  the 
inversions  of  arithmetic  operations).  Therefore,  it  is  reasonable  to 
believe  that  number-sentence  construction  skills  are  themselves 
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significant  predictors  of  future  mathematical  outcomes.  However, 
most  studies  on  arithmetic  word-problem  solving  used  the  word- 
problem  measure  only  as  a  measure  of  the  results  (e.g.,  Bjork  & 
Bowyer-Crane,  2013;  Zheng,  Swanson,  &  Marcoulides,  2011).  As 
such,  it  is  uncertain  how  arithmetic  word-problem  solving  skills,  or 
number-sentence  construction  skills  in  particular,  are  related  to 
other  mathematical  tasks.  Fuchs  and  colleagues’  (2014)  training 
study  may  provide  hints  to  this  matter.  Having  trained  a  large 
sample  of  second  graders  in  either  arithmetic  computation  or  the 
construction  of  number  sentences  based  on  arithmetic  word  prob¬ 
lems,  they  found  that  number-sentence  construction  intervention, 
but  not  computation  intervention,  brought  about  noteworthy  im¬ 
provement  in  children’s  prealgebraic  knowledge  (solving  algebraic 
equations;  e.g.,  find  the  value  of  *  in  x  +  5  =  11).  This  highlights 
the  importance  of  number-sentence  construction  in  learning  pre¬ 
algebraic  knowledge.  The  current  study  looked  to  extend  this  by 
exploring  the  contributions  of  children’s  number-sentence  con¬ 
struction  and  arithmetic  computation  skills  to  their  future  mathe¬ 
matics  achievement  and  arithmetic  computation  skills.  These  two 
skills  were  chosen  as  the  outcome  measures  because  although  the 
former  is  a  comprehensive  measure  of  children’s  general  mathe¬ 
matics  knowledge,  the  latter,  as  discussed  earlier,  is  considered  the 
most  fundamental  mathematical  skill  related  to  many  areas  of 
mathematics.  Both  components  were  expected  to  be  uniquely 
predictive  of  future  mathematics  achievement.  However,  it  was  not 
certain  if  number-sentence  construction  still  predicted  future  arith¬ 
metic  computation  when  concurrent  arithmetic  computation  skills 
have  been  taken  into  account. 

The  Current  Study 

In  view  of  the  preceding  literature  review,  we  divided  the 
arithmetic  word-problem  solving  process  into  two  component 
processes — number-sentence  construction  and  computation — in 
the  current  study  so  that  the  following  goals  could  be  reached:  (a) 
to  identify  the  predictors  of  these  component  processes,  (b)  to 
compare  the  relative  contributions  of  these  component  processes  to 
overall  arithmetic  word-problem  solving,  and  (c)  to  examine  how 
performance  in  these  component  processes  could  predict  future 
computation  and  mathematics  achievement  in  children.  The  pres¬ 


ent  study  was  part  of  a  3 -year  longitudinal  study  on  mathematics 
learning  with  four  time  points  (Times  1-4;  see  Table  1  for  details 
on  the  time  points  and  the  measures  conducted  at  each).  At  Time 
3,  a  sample  of  first  graders  was  given  an  arithmetic  word-problem 
solving  task  in  which  they  were  requested  to  write  number  sen¬ 
tences  (e.g.,  9  -  2  =)  before  solving  problems.  A  parallel  set  of 
arithmetic  computation  items  was  provided  to  measure  the  chil¬ 
dren’s  computation  skills.  This  design  allowed  for  teasing  apart  the 
participants’  number-sentence  construction  skills  from  their  com¬ 
putation  skills.  To  identify  early  predictors  of  these  two  component 
word-problem  skills,  various  numerical-magnitude  processing 
skills  and  domain-general  cognitive  skills  were  assessed  6  months 
or  1  year  before  they  attempted  the  word  problems  (Times  1  and 
2).  The  participants’  mathematics  achievement  was  assessed  1  year 
after  they  attempted  the  arithmetic  word  problems  (Time  4).  This 
permitted  examining  the  relations  between  the  two  component 
word-problem  skills  and  future  mathematics  outcomes. 

Method 

Participants 

In  this  longitudinal  study,  an  initial  sample  of  210  kindergar¬ 
teners  (mean  age  =  73  months,  SD  =  4  months,  52.1%  were  boys) 
was  recruited  from  17  Hong  Kong  kindergartens  with  parental 
consent  at  Time  1.  As  a  consequence  of  attrition,  153  children 
(mean  age  =  97  months,  SD  —  3.96  months,  53.6%  were  boys) 
remained  by  the  final  wave  of  testing  (Time  4)  when  they  were  in 
Grade  2.  Participants  who  dropped  out  performed  similarly  to 
those  who  remained  in  the  final  sample  in  five  of  six  measures 
conducted  at  Time  1  (except  for  number-numerosity  mapping,  in 
which  the  final  sample  did  slightly  better,  t  =  2.538,  p  =  .012), 
suggesting  that  the  attrition  was  not  systematically  biased.  All 
participants  were  Chinese  speaking.  Mathematics  learning  in  Hong 
Kong  involves  a  large  amount  of  drilling.  Although  Hong  Kong 
mathematics  classes  are  relatively  well  structured  compared  with 
their  Western  counterparts,  teachers  in  Hong  Kong  spend  less  time 
connecting  school  mathematics  to  real-life  examples  (Leung, 
2005).  Because  the  mathematics  curriculum  in  the  local  primary 


Table  1 

Time  Points  in  Which  the  Measures  Were  Conducted 


Variable  category 

Variable 

Time  1  end 
of  K3 

Time  2  middle 
of  Grade  1 

Time  3  end  of 
Grade  1 

Time  4  end 
of  Grade  2 

Early  predictors 

Nonsymbolic  numerical  processing 

X 

Symbolic  numerical  processing 

X 

Symbolic-nonsymbolic  mapping 

X 

Phonological  loop 

X 

Visual-spatial  sketchpad 

X 

Central  executive 

X 

Nonverbal  intelligence 

X 

Word  reading 

X 

Word-problem  component  processes 

Arithmetic  (Set  1) 

X 

Number  sentence  construction 

X 

Outcomes 

Word  problem  (answers) 

X 

Arithmetic  (Set  2) 

X 

Learning  and  achievement  measurement  kit 

X 

Note.  K3  =  third  year  of  kindergarten. 
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schools  is  standardized,  instructional  differences  based  on  the 
involvement  of  various  schools  in  this  study  were  expected  to  be 
minimal,  and  this  was  supported  by  the  absence  of  significant 
school  effects  on  any  of  the  measures  in  this  study  (all  Fs  <  1 .6,  all 
ps  >  .09). 

Measures 


There  were  3  practice  trials  (3,  5,  and  7)  followed  by  20  experi¬ 
mental  trials.  Because  linearity  (R2  of  the  best-fit  linear  function, 
or  the  amount  of  explained  variance  of  the  participants’  responses 
by  the  target  values)  was  found  to  be  a  better  indicator  of  chil¬ 
dren’s  estimation  skills  compared  with  the  percentage  of  absolute 
error  (Booth  &  Siegler,  2006),  linearity  was  used  in  the  subsequent 
analyses. 


Four  kinds  of  measures  were  conducted  in  the  current  study: 
numerical-magnitude  processing  measures  (nonsymbolic  numeri¬ 
cal  processing,  symbolic  numerical  processing,  and  symbolic- 
nonsymbolic  mapping),  domain-general  measures  (phonological 
loop,  visuospatial  sketchpad,  central  executives,  and  nonverbal 
intelligence),  word  reading,  and  arithmetic  word-problem  and  out¬ 
come  measures  (arithmetic  word  problems,  computation,  and 
mathematics  achievement).  Although  the  number-specific  mea¬ 
sures  were  conducted  on  computers,  others  were  performed  on 
paper  or  in  a  verbal  format.  Discontinuation  rules  applied  only  to 
three  of  the  measures  (Corsi  block,  backward  digit  span,  and  word 
reading).  The  reliabilities  of  the  measures  can  be  seen  in  Table  2. 

Numerical-Magnitude  Processing  Measures 

Nonsymbolic  numerical  processing.  The  nonsymbolic  nu¬ 
merical  comparison  task,  modified  from  Piazza,  Izard,  Pinel,  Le 
Bihan,  and  Dehaene  (2004),  was  used  to  assess  the  participants’ 
nonsymbolic  numerical  processing.  The  participants  saw  two  ar¬ 
rays  of  dots  (varying  by  size)  on  the  screen,  and  they  were  to 
decide  which  arrays  had  more  dots  without  counting.  One  of  the 
arrays  always  contained  16  dots,  whereas  the  other  array  had 
10-22  dots  (resulting  in  10  ratios,  from  1.6  to  1.063).  In  half  of  the 
trials,  the  average  dot  size  was  directly  proportional  to  the  numer- 
osity;  the  inverse  was  true  for  the  other  half.  There  were  five 
practice  trials  followed  by  50  experimental  trials.  The  Weber 
fraction,  w2,  was  utilized  in  the  analyses. 

Symbolic  numerical  processing.  The  participants’  symbolic 
numerical  processing  was  assessed  using  the  number  comparison 
task.  In  each  trial,  two  Arabic  numerals  were  presented  on  a 
computer  screen,  and  the  participants  had  to  decide  which  numeral 
was  numerically  larger  and  then  press  the  corresponding  key  (F 
indicated  that  the  numeral  on  the  left  was  larger;  J  indicated 
otherwise).  The  numerals  ranged  from  1  to  9,  and  the  numerical 
distance  between  the  numeral  pairs  ranged  from  1  to  4.  There  were 
4  practice  trials  and  36  experimental  trials  (with  the  same  number 
of  items  for  each  numerical  distance).  Because  the  average  accu¬ 
racy  of  this  task  was  high  (>90%),  the  reaction  time  (RT)  was 
used  to  indicate  participants’  symbolic  numerical  processing  skills. 

Symbolic-nonsymbolic  mapping.  The  numerosity  produc¬ 
tion  task,  adapted  from  Crollen,  Castronovo,  and  Seron  (2011), 
was  used  to  assess  the  participants’  symbolic-nonsymbolic  map¬ 
ping  skills.  The  experimenter  first  presented  a  screen  shot  of  100 
dots  to  the  participants  and  told  them  that  there  were  100  dots 
present.  Afterward,  the  participants  saw  an  Arabic  numeral  on  the 
screen  and  they  had  to  press  the  corresponding  key  to  produce 
approximately  the  same  number  of  dots.  The  numbers  ranged  from 
10  to  100.  In  half  of  the  trials,  the  average  dot  size  was  held 
constant;  in  the  other  half,  the  total  area  occupied  by  the  dots  was 
held  constant.  The  participants  were  explicitly  told  not  to  count, 
and  they  had  to  hold  the  key  down  so  that  they  could  not  count. 


Domain-General  Measures 

Phonological  loop.  A  syllable-recall  task  was  administered  to 
assess  participants’  phonological  loop.  In  each  trial,  the  partici¬ 
pants  listened  to  a  sequence  of  Cantonese  syllables  at  a  pace  of  one 
per  second,  and  they  had  to  repeat  the  sequence  in  the  exact  order 
after  listening.  After  a  practice  trial,  there  were  15  experimental 
trials  arranged  in  five  difficulty  levels.  The  number  of  syllables  to 
be  recalled  increased  by  one  with  each  ascending  level.  Each 
correctly  recalled  syllable  and  correct  order  yielded  one  mark. 

Visuospatial  sketchpad.  The  Corsi  block  task  was  used  as  a 
measure  of  the  participants’  visuospatial  sketchpad  capacity.  The 
participants  were  shown  a  black  board  with  nine  black  boxes  on  it. 
In  each  trial,  the  experimenter  tapped  the  boxes  in  a  certain 
sequence  at  a  pace  of  one  box  per  second.  After  watching,  the 
participants  had  to  tap  the  boxes  in  the  exact  same  order.  Each 
exact  recall  scored  one  mark.  Two  practice  trials  were  followed  by 
14  experimental  trials  arranged  at  seven  difficulty  levels.  The 
length  of  the  sequence  increased  by  one  with  each  ascending  level. 
The  task  was  terminated  when  the  participants  failed  two  items  at 
any  level. 

Central  executive.  The  central  executive  of  the  participants 
was  evaluated  using  the  backward  digit  span  task.  The  participants 
were  verbally  presented  with  a  sequence  of  numbers  at  a  rate  of 
one  per  second.  They  were  then  asked  to  recall  the  numbers  in  the 
reverse.  Each  exact  recall  scored  one  mark.  Similar  to  the  Corsi 
block  task,  there  were  2  practice  trials  and  14  experimental  trials 
arranged  at  seven  difficulty  levels.  With  each  ascending  level,  the 
number  of  numbers  to  be  recalled  increased  by  one.  When  the 
participants  failed  two  items  at  any  given  level,  the  task  was 
terminated. 

Nonverbal  intelligence.  The  short  form  (set  A  to  set  C)  of 
Raven’s  Standard  Progressive  Matrices  (Raven,  1956)  was  used  to 
gauge  the  nonverbal  intelligence  of  the  participants.  It  was  also 
used  for  the  same  purpose  in  hundreds  of  other  studies  (see 
Wongupparaj,  Kumari,  &  Morris,  2015,  for  a  meta- analysis),  hav¬ 
ing  been  shown  to  correlate  with  arithmetic  word-problem  solving 
(e.g.,  Traff,  2013).  For  each  of  the  36  items,  the  participants  saw 
a  visual  pattern  with  a  missing  piece.  They  were  then  asked  to 
identify  the  correct  piece,  out  of  six  to  eight  alternatives,  that  could 
fit  the  missing  place  in  the  visual  pattern.  Each  correct  item  scored 
one  mark.  Raw  scores  were  translated  into  scaled  scores  based  on 
local  norms. 


w  is  used  to  refer  to  the  finest  ratio  of  the  two  numerosities  that  one  can 
i eliably  distinguish.  It  is  the  difference  between  the  two  numbers  divided 
by  the  smaller  number.  For  example,  if  the  finest  ratio  that  one  can 
distinguish  is  6:7,  then  the  vr  of  that  person  would  be  (7-6)/6  =  0.167.  The 
w  of  the  participants  in  this  study  was  calculated  based  on  the  formula: 

error  rate  =  \erfc[ - 

2  lV2  wVrf  +  ri 
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Table  2 


Descriptive  Statistics  and  Correlations  Among  Variables 


Variable 

M 

SD 

Max  Reliability 

Correlations 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10  11  12 

1.  NSP 

0.241 

0.125 

NA 

NA 

2.  SP 

1,523.254 

413.792 

NA 

.952 

.227** 

— 

3.  SNM 

0.489 

0.255 

1 

NA 

— .  162+ 

-.310*** 

— 

4.  PL 

85.748 

19.445 

225 

.770 

.037 

-.170* 

295*** 

— 

5.  VSSP 

5.923 

1.548 

14 

.612 

-.260** 

-.225** 

.124 

.103 

— 

6.  CE 

5.315 

1.207 

14 

.503 

-.095 

-.213* 

.240** 

.188* 

.088 

_ 

7.  IQ 

111.294 

12.191 

NA 

.841 

-.236** 

-.300*** 

.211* 

.197* 

.182* 

.209* 

— 

8.  WR 

72.874 

26.008 

150 

.980 

-.131 

-.253** 

.341*** 

.360*** 

.100 

.250** 

.280** 

— 

9.  Ar3 

14.902 

2.246 

18 

.715 

-.121 

-.332*** 

391  *** 

.255** 

.204* 

.230** 

.340*** 

.413*” 

— 

10.  NSC 

16.566 

3.695 

22 

.792 

-.199* 

—  329*** 

.284** 

.356*** 

.217** 

.204* 

.433*** 

.344*** 

.547*** 

— 

11.  WPA 

12.587 

3.167 

18 

.809 

-.198* 

-.383’** 

.299*** 

.359*** 

.199* 

.264** 

.428”* 

.386*** 

.636*” 

.898***  — 

12.  Ar4 

14.154 

2.723 

25 

.729 

-.052 

-.314*** 

.236** 

.214* 

.126 

.234** 

.274” 

.284** 

492*** 

.422*”  .505**’  — 

13.  LAMK 

40.566 

6.847 

51 

.876 

-.257** 

-.381*** 

.295*** 

321  *** 

.254** 

.254** 

.488**’ 

.412’” 

.584*** 

.616*”  .682***  .512*** 

Note.  NA  =  not  applicable;  NSP  =  nonsymbolic  processing;  SP  =  symbolic  processing;  SNM  =  symbolic-nonsymbolic  mapping;  PL  =  phonological 
loop;  VSSP  =  visuospatial  sketchpad;  CE  =  central  executive;  IQ  =  nonverbal  intelligence;  WR  =  word  reading;  Ar3  =  Arithmetic  computation  Time 
3;  NSC  =  number  sentence  construction;  WPA  =  arithmetic  word-problem  answers;  Ar4  =  arithmetic  computation  Time  4;  LAMK  =  Learning  and 
Achievement  Measurement  Kit. 

>  <  .1.  *  p  <  .05.  *><.01.  **><.001. 


Word  reading.  The  participants’  word-reading  skills  were 
assessed  using  the  Chinese  word-reading  subtest  from  the  Hong 
Kong  Test  of  Specific  Learning  Difficulties  in  Reading  and  Writ¬ 
ing  for  Primary  School  Students,  Second  Edition,  or  HKT-P(II) 
(Ho  et  al.,  2007).  The  HKT-P(II)  is  a  standardized  diagnostic  tool 
for  identifying  children  with  dyslexia  in  the  local  setting.  In  the 
word-reading  subtest,  the  participants  were  asked  to  read  aloud 
150  Chinese  two-character  words.  Each  correctly  read  word  was 
scored  one  mark.  The  task  was  terminated  when  the  participants 
failed  to  score  15  consecutive  items. 

Arithmetic  Word-Problem  and  Outcome  Measures 

Arithmetic  word  problems.  An  arithmetic  word-problem 
task  was  designed  by  the  first  author  for  the  current  study.  The 
participants  were  presented  with  18  arithmetic  word  problems.  The 
word  problems  were  available  both  visually  (in  print)  and  verbally 
(read  out  by  the  experimenter;  presenting  the  items  verbally  re¬ 
duced  the  demand  on  reading  skills).  The  participants  were  in¬ 
structed  to  write  down  the  number  sentences  before  they  solved  the 
problems  (see  Figure  1  for  an  illustration).  To  familiarize  the 
participants  with  the  task,  a  sample  item  was  presented  in  which 
the  experimenter  demonstrated  the  arithmetic  word-problem  solv¬ 
ing  process  to  the  participants  if  they  failed.  Fourteen  of  the  items 
involved  single-step  addition  or  subtraction  and  four  more  ad- 

Given:  “There  are  7  boys  and  several  girls  in  the  swimming  class.  There  are  3 
more  boys  than  girls.  How  many  girls  are  there  in  the  swimming  class?” 

Participants  ’  response:  7  —  3  =  4 

/  \ 


Number  sentence 

Word  problem 

construction 

answer 

Figure  1.  Illustration  of  the  arithmetic  word  problem  solving  task. 


vanced  problems  involved  multistep  addition  and/or  subtraction. 
All  numbers  involved  were  less  than  100.  This  task  was  scored  in 
two  ways.  For  number-sentence  construction,  the  participants  ob¬ 
tained  score(s)  for  each  correctly  written  number  sentence  ( 1  point 
for  simple  ones  and  2  points  for  advanced  ones  because  more  than 
one  relation  was  involved;  partial  scores  were  allowed).  The  par¬ 
ticipants  also  received  an  answer  score  for  each  correct  final 
answer.  Both  the  number-sentence  construction  score  and  answer 
score  were  used  in  later  analyses. 

Arithmetic  computation.  Two  arithmetic  computation  tasks 
were  utilized  in  the  current  study.  That  used  at  Time  3  was  a 
parallel  to  the  arithmetic  word-problem  task.  The  participants  were 
presented  with  18  arithmetic  items  in  number-sentence  form  that 
they  had  to  solve.  The  items  were  exactly  the  same  as  those  used 
in  the  arithmetic  word-problem  task,  but  they  were  arranged  in  a 
different  order.  At  Time  4,  the  participants  were  presented  with 
another  25  arithmetic  items.  These  items  involved  three-digit  ad¬ 
dition  and  subtraction;  single-  and  two-digit  multiplication  and 
division;  and  multistep,  mixed  operations.  The  participants  re¬ 
ceived  one  mark  for  each  correct  answer. 

The  order  of  administering  the  arithmetic  word-problem  task 
and  the  arithmetic  computation  task  was  counterbalanced  so  that 
any  potential  order  effect  would  be  cancelled  out  across  partici¬ 
pants.  Given  that  (a)  the  orders  of  the  items  in  the  two  tasks  were 
different,  (b)  the  items  were  so  common  that  the  participants 
should  have  encountered  them  in  their  daily  lives,  and  (c)  none  of 
the  participants  expressed  concern  that  the  items  of  the  two  tasks 
were  the  same,  it  was  unlikely  that  they  had  recognized  similarity 
between  the  tasks.  Even  had  the  participants  become  aware  of  that, 
our  results  would  not  be  affected  because  no  feedback  was  pro¬ 
vided  to  the  participants. 

Mathematics  achievement.  The  Learning  and  Achievement 
Measurement  Kit  2.0  Second  Grade  Mathematics  (LAMK  2.0  Mb; 
Hong  Kong  Education  Bureau,  2008)  was  administered  at  Time  4 
to  measure  participants’  mathematics  achievement.  The  LAMK 
was  designed  by  the  local  education  bureau  to  distinguish  children 
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who  needed  remedial  services  in  major  subjects.  The  current 
version  (Second  Grade  Mathematics)  consisted  of  33  items,  cov¬ 
ering  all  topics  in  the  local  curriculum,  including  arithmetic,  shape 
and  space,  measure,  and  simple  statistics.  The  time  limit  for  this 
task  was  45  min,  and  most  children  were  able  to  finish  in  entirety 
within  this  duration.  The  raw  score  was  used  in  the  current  anal¬ 
yses. 

All  measures,  except  for  the  Corsi  block  and  the  backward  digit 
span,  were  reliable.  Furthermore,  all  of  these  measures  (except  for 
arithmetic  word-problem  solving  and  arithmetic  computation)  had 
already  been  used  by  other  studies  to  measure  the  relevant  con¬ 
structs;  therefore,  they  were  considered  valid  measurements.  The 
arithmetic  word-problem  solving  and  arithmetic  computation  mea¬ 
sures  were  extracted  from  local  textbooks  for  measuring  the  rele¬ 
vant  constructs,  and  their  significant  correlations  with  mathematics 
achievement  (rs  >  .5)  indicated  that  they  had  high  criterion  va¬ 
lidity. 

Procedures 

The  participants  were  assessed  by  trained  experimenters  in  their 
own  kindergartens  at  Time  1 .  They  underwent  three  more  testing 
sessions  when  they  were  in  the  middle  of  first  grade  (Time  2),  at 
the  end  of  first  grade  (Time  3),  and  second  grade  (Time  4)  at  their 
homes.  Most  predictor  measures  (except  nonverbal  intelligence 
and  word  reading)  were  performed  at  Time  1 ,  whereas  the  outcome 
measures  were  conducted  at  Time  3  (arithmetic  word-problem 
solving,  and  arithmetic  computation)  and  Time  4  (arithmetic  com¬ 
putation,  mathematics  achievement).  The  nonverbal  intelligence 
test  was  conducted  at  Time  2  based  on  the  time  constraint  at  Time 
1 ,  and  the  word-reading  task  was  administered  at  Time  3  so  that 
the  participants  would  receive  1  year  of  formal  instruction  on 
reading  before  it  was  assessed,  which  therefore  provided  a  reliable 
measure  of  the  ability.  Because  both  constructs  were  stable  across 
time  (Deary,  2014;  Kowaleskil-Jones  &  Duncan,  1999),  these 
minor  differences  in  time  of  measurement  were  not  expected  to 
have  any  significant  effect  on  our  findings.  The  order  of  the  tasks 
performed  at  the  same  time  point  was  counterbalanced  across 
participants.  Each  testing  session  lasted  for  approximately  2  h  with 
breaks,  and  souvenirs  were  gifted  to  the  participants  as  tokens  of 
appreciation. 

Data  Analyses 

The  following  analyses  were  conducted  using  R  (R  Develop¬ 
ment  Core  Team,  2008)  with  the  lavaan  package  (Rosseel,  2012). 
The  data  were  first  screened  for  univariate  outliers.  Ten  data 
points,  which  were  >3  SD  beyond  the  corresponding  means,  were 
excluded  because  only  participants  with  a  complete  data  set  could 
be  included  in  modeling.  Multivariate  outliers  were  screened  using 
Mahalanobis  distance  in  SPSS.  No  multivariate  outliers  were 
found.  Therefore,  the  final  sample  consisted  of  143  children.  The 
Marida  test  was  then  applied  to  examine  the  skewness  and  kurtosis 
of  the  data.  The  data  were  significantly  skewed  (skewness  = 
557.26,  p  <  .001).  Hence,  the  models  were  analyzed  using  max¬ 
imum  likelihood  estimation  with  robust  standard  errors  (MLR). 

In  the  following  analyses,  a  path  model  approach  was  followed 
to  evaluate  the  relations  among  number-sentence  construction, 
computation,  and  other  variables.  On  the  basis  of  Hu  and  Bender 


(1999),  these  criteria  were  used  for  model  evaluation:  insignificant 
X2  test,  comparative  fit  index  (CFI)  >  .95,  root  mean  square  error 
of  approximation  (RMSEA)  <  .06,  and  standardized  root  mean 
square  residual  (SRMR)  <  .08. 

Results 

The  descriptive  statistics,  reliabilities,  and  correlations  among 
variables  are  presented  in  Table  2.  In  particular,  participants  ob¬ 
tained  an  average  score  of  12.587  of  18  (69.9%)  in  the  arithmetic 
word-problem  task,  indicating  ,a  reasonable  mastery.  Among  the 
30.1%  errors  made,  22.6%  were  because  of  errors  in  number- 
sentence  construction,  and  the  remaining  7.5%  were  computation 
errors.  The  findings  indicated  that  the  participants’  difficulty  in 
solving  word  problems  resided  more  with  number-sentence  con¬ 
struction  rather  than  computation. 

The  path  model  was  set  up  to  find  the  relations  between  the 
various  cognitive  capacities  and  the  two  component  arithmetic 
word-problem  solving  skills  (number-sentence  construction  and 
computation)  as  well  as  the  relative  contributions  of  number- 
sentence  construction  and  arithmetic  computation  to  future  arith¬ 
metic  computation  performance  and  mathematics  achievement.  All 
of  the  number-specific  and  domain-general  predictors  (from  Time 
1  to  Time  3)  were  utilized  to  predict  both  number-sentence  con¬ 
struction  and  arithmetic  computation  in  Time  3,  and  these  two 
variables  were  used  to  predict  arithmetic  computation  and  math¬ 
ematics  achievement  at  Time  4.  Arithmetic  computation  at  Time  4 
was  also  expected  to  predict  mathematics  achievement  at  the  same 
time  point.  Covariations  among  exogenous  variables  were  not 
included  in  the  model.3  The  model  did  not  fit  the  data  very  well, 
with  x2(16,  N  =  143)  =  27 .359,  p  =  .038,  CFI  =  .953,  RMSEA  = 
.070,  and  SRMR  =  .039.  To  improve  model  fit,  the  modification 
indices  were  examined.  A  direct  path  from  nonverbal  intelligence 
to  mathematics  achievement  was  suggested.  Because  nonverbal 
intelligence  represents  one’s  general  learning  capacity  and  has 
been  found  to  predict  a  large  variety  of  academic  outcomes  (Deary, 
Strand,  Smith,  &  Fernandes,  2007),  this  suggested  path  was 
deemed  theoretically  sound;  therefore,  it  was  added  to  the  model. 
Thereafter,  model  fit  was  improved,  with  x2(15,  N  =  143)  = 
16.404,  p  =  .356,  CFI  =  .994,  RMSEA  =  .026,  and  SRMR  = 
.030.  Among  all  of  the  predictor  variables,  just  symbolic- 
nonsymbolic  mapping  ((3  =  .209,  p  =  .005),  nonverbal  intelli¬ 
gence  ((3  =  .166,  p  =  .033),  and  word  reading  ((3  =  .234,  p  = 
.006)  significantly  predicted  arithmetic  computation,  and  only 
nonverbal  intelligence  ((3  =  .273,  p  =  .001)  and  phonological  loop 
((3  =  .212,  p  =  .010)  significantly  predicted  number-sentence 
construction.  Both  number-sentence  construction  ((3  =  .218,  p  = 
.007)  and  arithmetic  computation  ((3  =  .373,  p  <  .001)  at  Time  3 
were  predictive  of  arithmetic  computation  at  Time  4,  with  27.5% 
of  the  variance  being  explained.  Children’s  mathematics  achieve¬ 
ment  was  significantly  predicted  by  number-sentence  construction 
at  Time  3  ((3  =  .302,  p  <  .001),  arithmetic  computation  at  Time  3 
(0  =  -245>  P  =  -001)  and  Time  4  ({3  =  .205,  p  =  .001),  and 
nonverbal  intelligence  ((3  =  .220,  p  =  .001),  with  53.8%  of  the 


The  examination  of  modification  indices  suggested  that  the  inclusion 
of  covariations  among  exogenous  variables  did  not  significantly  improve 
the  model  fit.  None  of  the  modification  indices  were  greater  than  the 
critical  value  of  3.84. 
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variance  being  explained  through  these  variables  (see  Figure  2  for 
details).  All  of  the  aforementioned  pathways  were  further  exam¬ 
ined  using  a  bootstrapping  procedure  with  5,000  draws,  and  the 
pattern  was  exactly  the  same. 

Discussion 

The  current  study  was  conducted  to  examine  children’s  arith¬ 
metic  word-problem  solving,  suggested  by  mathematics  teachers 
as  one  of  the  most  difficult  topics  in  mathematics  learning  (Na¬ 
tional  Mathematics  Advisory  Panel,  2008).  The  arithmetic  word- 
problem  solving  process  was  divided  into  two  component 
processes — number-sentence  construction  and  computation.  The 
present  findings  demonstrated  that  children’s  difficulties  in  solving 
arithmetic  word  problems  were  founded  more  upon  number- 
sentence  construction  than  computation.  Although  children’s  per¬ 
formance  in  arithmetic  computation  was  predicted  by  numerical- 
magnitude  processing  (symbolic-nonsymbolic  mapping),  domain- 
general  (nonverbal  intelligence),  and  word-reading  capacities,  only 
domain-general  capacities  (nonverbal  intelligence  and  phonologi¬ 
cal  loop)  predicted  children’s  number-sentence  construction  skills. 
Both  number-sentence  construction  and  arithmetic  computation 
significantly  predicted  arithmetic  computation  and  overall  mathe¬ 
matics  achievement  1  year  later.  It  is  worth  mentioning  that 
number-sentence  construction  at  Time  3  significantly  predicted 
arithmetic  computation  at  Time  4,  even  after  the  effect  of  arith¬ 
metic  computation  at  Time  3  had  been  controlled  for.  These 
findings  highlight  the  significance  of  number-sentence  construc¬ 
tion  in  arithmetic  computation  specifically  and  in  mathematics 
learning  in  general.  The  theoretical  implications  of  these  results  are 
further  elaborated  on  in  the  following  sections. 

The  Two  Major  Processes  in  Arithmetic  Word- 
Problem  Solving 

The  current  study  investigated  children’s  arithmetic  word- 
problem  solving  by  decomposing  the  task  into  the  number- 


sentence  construction  and  computation  processes.  The  number- 
sentence  construction  process  corresponds  to  the  first  three  stages 
of  the  word-problem  solving  process  (translation,  integration,  and 
planning)  as  proposed  by  Mayer  and  Hegarty  (1996),  whereas  the 
computation  process  matched  the  stage  of  execution  in  the  Mayer 
and  Hegarty  (1996)  model.  In  the  current  study,  three  quarters  of 
the  participants’  errors  in  solving  arithmetic  word  problems  in¬ 
volved  issues  with  constructing  number  sentences.  Most  of  the 
errors  in  number-sentence  construction  were  the  result  of  misun¬ 
derstandings  of  the  relations  between  the  variables  (e.g.,  directly 
translating  the  word  more  in  the  problem  into  an  addition).  This  is 
in  agreement  with  those  from  other  studies,  which  suggest  that  the 
major  difficulty  in  solving  arithmetic  word  problems  is  based  on 
the  conceptual  understanding  of  the  problem  (i.e.,  understanding 
the  relations  among  the  variables)  and  the  translation  of  the  prob¬ 
lem  into  the  relevant  number  sentences  (Hegarty  et  al.,  1992; 
Lewis  &  Mayer,  1987). 

Cognitive  Predictors  of  Number-Sentence  Construction 
and  Computation 

The  major  goal  of  the  current  study  was  to  identify  the  correlates 
of  the  arithmetic  word-problem  solving  component  processes. 
Such  investigation  has  seldom  been  performed  in  previous  work 
because  these  studies  only  examined  the  arithmetic  word-problem 
solving  process  as  a  whole  (e.g.,  Kyttala  et  al.,  2014;  Seethaler, 
Fuchs,  Fuchs,  &  Compton,  2012;  Tolar  et  al.,  2012).  By  separating 
the  two  component  processes,  the  current  study  yielded  further 
information  about  the  cognitive  capacities  required  for  each  of  the 
component  processes  and,  hence,  improved  the  understanding  of 
them.  With  this,  various  patterns  were  observed.  First,  nonverbal 
intelligence  was  observed  to  be  related  to  both  number-sentence 
construction  and  computation.  This  is  not  surprising  given  that 
intelligence  has  been  exhibited  to  be  a  general  predictor  of  many 
achievement  tasks  (Deary  et  al.,  2007).  Second,  only  one  of  the 
numerical-magnitude  processing  skills  (symbolic-nonsymbolic 
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Figure  2.  Path  model:  relations  between  arithmetic  word  problem  solving  processes  and  the  cognitive 
predictors  and  mathematical  outcomes.  *  p  <  .05.  **  p  <  .01.  ***  p  <  .001. 
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mapping)  predicted  computation,  and  none  of  them  predicted 
number-sentence  construction.  However,  the  number-sentence 
construction  process  seems  to  rely  on  domain-general  cognitive 
capacities  instead  of  number-specific  ones.  This  is  reasonable 
given  that  computation,  compared  with  number-sentence  construc¬ 
tion,  seems  to  impose  higher  demands  on  numerical-magnitude 
processing.  That  only  symbolic-nonsymbolic  mapping  signifi¬ 
cantly  predicted  arithmetic  computation  suggests  that  it  is  the 
association  between  the  two  forms  of  numerical  representations 
(i.e.,  symbolic  and  nonsymbolic),  rather  than  the  comparison  of 
magnitudes  within  a  single  form  of  representation,  that  matters  to 
children’s  arithmetic  performance.  Understanding  the  referent  of 
the  number  symbols  makes  the  arithmetic  tasks  more  meaningful 
because  children  know  what  the  numbers  are  referring  to  (Wong, 
Ho,  &  Tang,  2016).  However,  the  comparison  of  numerical  mag¬ 
nitude  does  not  seem  to  be  directly  related  to  children’s  arithmetic 
performance.  This  might  be  why  the  relation  between  mathematics 
achievement  and  symbolic-nonsymbolic  mapping  (all  rs  >  .31; 
Booth  &  Siegler,  2006;  Mejias,  Gregoire,  &  Noel,  2012)  seems  to 
be  stronger  than  those  between  mathematics  achievement  and 
symbolic  and  nonsymbolic  numerical  comparison  (average  rs  = 
.30  and  .24  for  symbolic  and  nonsymbolic  comparison,  respec¬ 
tively;  Schneider  et  al.,  2016). 

Third,  the  roles  of  working  memory  and  word-reading  skills  in 
children’s  arithmetic  word-problem  solving  are  worth  discussing. 
Both  have  been  shown  to  be  related  to  arithmetic  word-problem 
solving  in  previous  studies  (e.g.,  Bjork  &  Bowyer-Crane,  2013; 
Zheng  et  al.,  2011),  but  their  exact  parts  in  different  arithmetic 
word-problem  solving  processes  has  remained  unclear.  Here,  we 
have  demonstrated  that  working  memory  and  word-reading  skills 
made  differential  contributions  to  the  two  problem-solving  com¬ 
ponent  processes.  Although  working  memory,  or  phonological 
loop  in  particular,  was  related  to  number-sentence  construction  but 
not  to  computation,  word-reading  skills  were  connected  with  com¬ 
putation  but  not  number-sentence  construction.  Working  memory 
may  enhance  the  number-sentence  construction  process  by  provid¬ 
ing  capacity  to  keep  the  information  presented  in  the  word  problem 
in  mind  for  further  processing  (Swanson,  2011),  and  the  finding 
that  only  the  phonological  loop,  but  not  the  visual-spatial  sketch¬ 
pad,  significantly  predicted  number-sentence  construction  is  con¬ 
sistent  with  the  suggestion  that  school-age  children  tend  to  repre¬ 
sent  arithmetic  problems  in  verbal  format  (Rasmussen  &  Bisanz, 
2005).  The  difference  in  predictive  powers  between  the  central 
executive  and  the  phonological  loop  with  regards  to  arithmetic 
word-problem  solving  seems  to  indicate  that  it  is  the  storage 
component,  instead  of  the  manipulation  component,  that  plays  a 
role  in  children’s  arithmetic  word-problem  solving.  However, 
given  the  strong  relation  between  the  central  executive  and  math¬ 
ematical  skills  observed  in  other  studies  (Lee  &  Bull,  2015;  Zheng 
et  al.,  2011),  along  with  the  relatively  low  reliability  of  the  central 
executive  measure  in  this  study  (Cronbach’s  a  =  .503),  more 
studies  are  necessary  for  further  clarification. 

The  shared  variance  between  word  reading  and  computation, 
which  had  been  reported  in  other  studies  as  well  (Fuchs,  Fuchs, 
Compton,  Hamlett,  &  Wang,  2015;  Hecht,  Torgesen,  Wagner,  & 
Rashotte,  2001),  can  be  explained  by  their  common  component  of 
retrieval  of  information  from  long-term  memory  (Hecht  et  al., 
2001)  as  well  as  by  their  common  requirement  of  phonological 
awareness  (Bjork  &  Bowyer-Crane,  2013;  Hecht  et  al.,  2001;  Ho 


&  Bryant,  1997).  If  access  of  information  from  long-term  memory 
is  not  efficient  or  if  the  child  is  not  sensitive  to  the  phonetic 
structure  of  human  speech,  then  the  processes  of  retrieval  and 
encoding  of  information  (e.g.,  words  and  arithmetic  facts)  will  be 
affected,  resulting  in  poor  performance  in  both  word  reading  and 
computation.  Future  studies  may  further  discern  these  possibilities 
by  including  measures  of  long-term  memory  retrieval  and  phono¬ 
logical  awareness.  It  should  be  noted  that  the  arithmetic  word 
problems  were  read  to  the  participants  in  the  current  study,  and  this 
might  have  reduced  the  demand  upon  word-reading  skills  with 
respect  to  arithmetic  word-problem  solving  processes.  However,  in 
real-life  classrooms  children  might  have  to  read  the  problems 
themselves,  incurring  greater  demand  of  word-reading  skills;  thus, 
the  relation  between  the  two  might  be  stronger. 

Significance  of  Number-Sentence  Construction 

In  previous  studies  on  arithmetic  word-problem  solving,  overall 
accuracy  is  usually  the  only  outcome  (or  one  of  only  two  out¬ 
comes,  besides  computation;  e.g.,  Bjork  &  Bowyer-Crane,  2013; 
Zheng  et  al.,  2011).  Therefore,  these  results  fail  to  inform  us  how 
arithmetic  word-problem  solving  or,  in  particular,  components  of 
arithmetic  word-problem  solving  are  related  to  other  mathematical 
tasks.  By  decomposing  arithmetic  word-problem  solving  into  its 
two  major  component  processes,  such  information  can  be  ob¬ 
tained.  In  particular,  the  significance  of  the  number-sentence  con¬ 
struction  process  on  various  mathematical  outcomes  could  be 
revealed.  It  was  observed  that  children’s  number-sentence  con¬ 
struction  skills  in  first  grade  significantly  predicted  their  overall 
mathematics  achievement  in  second  grade  even  after  taking  chil¬ 
dren’s  computation  skills  from  first  grade  into  account.  This  sug¬ 
gests  that  number-sentence  construction  and  computation  make 
their  own  unique  contributions  to  children’s  mathematics  learning. 
Number-sentence  construction  is  different  from  computation  in 
that  it  requires  the  ability  to  grasp  the  relations  among  the  variables 
of  ill-defined  problems  and  flexibility  to  devise  relevant  strategies 
to  solve  them.  These  skills  can  be  important  in  many  aspects  of 
mathematics  in  addition  to  arithmetic  (e.g.,  finding  the  area  of 
complex  figures  and  extracting  relevant  information  from  graphs 
and  charts)  and  may  explain  why  number-sentence  construction 
uniquely  predicted  children’s  mathematics  achievement. 

Children’s  number-sentence  construction  skills  were  also  found 
to  predict  arithmetic  computation  1  year  later,  even  after  the  effect 
of  earlier  computation  skills  had  been  controlled  for.  This  echoes 
what  Swanson  (2014)  discovered — number-sentence  construction 
intervention  involving  visual  strategies  benefits  students  with  math 
difficulties  in  terms  of  both  arithmetic  word-problem  solving  and 
arithmetic  computation  (but  see  Fuchs  et  al.,  2014,  for  a  null 
finding).  These  converging  findings  suggest  that  the  ability  to 
construct  number  sentences  from  arithmetic  word  problems  may 
itself  foster  children's  acquisition  of  arithmetic  computation.  One 
possible  explanation  for  this  relation  is  that  the  ability  to  construct 
number  sentences  reflects  a  better  conceptual  understanding  of  the 
important  mathematical  principles  of  arithmetic  operations.  For 
example,  children  who  possess  full  mastery  of  the  underlying 
meaning  of  various  arithmetic  operations  may  be  able  to  accurately 
and  efficiently  convert  a  word  problem  into  the  right  number 
sentences.  At  the  same  time,  with  better  understanding  of  the 
meaning  ot  the  basic  arithmetic  operations,  the  acquisition  of  new 
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arithmetic  operations  should  be  easier  (e.g.,  the  concept  of  multi¬ 
plication  is  essentially  built  on  the  concept  of  addition;  therefore, 
a  concrete  foundation  of  the  concept  of  addition  may  facilitate  the 
acquisition  of  multiplication).  Thus,  this  conceptual  understanding 
could  bring  about  better  performance  in  arithmetic  computation. 
Future  studies  should  be  considered  to  continue  investigating  the 
relations  among  number-sentence  construction,  arithmetic  compu¬ 
tation,  and  the  conceptual  understanding  of  arithmetic  operations. 

Limitations  and  Suggestions  for  Future  Research 

There  were  several  potential  limitations  in  the  current  study.  The 
first  involves  cross-cultural  differences  in  the  involvement  of 
number  sentences  in  arithmetic  word-problem  solving.  Although 
writing  down  number  sentences  is  a  common  practice  in  arithmetic 
word-problem  solving  in  Hong  Kong,  it  may  not  be  common  in 
other  countries.  Readers  need  to  be  cautious  about  the  generaliz- 
ability  of  the  current  findings  to  settings  where  writing  number 
sentences  is  not  frequently  performed.  The  second  potential  limi¬ 
tation  is  regarding  the  coverage  of  correlates.  As  a  consequence  of 
the  time  constraint  on  child  assessment,  we  were  unable  to  include 
a  comprehensive  list  of  cognitive  skills  related  to  children’s  arith¬ 
metic  word-problem  solving.  Various  language  skills,  such  as  print 
awareness,  vocabulary,  and  listening  and  reading  comprehension, 
are  suggested  to  be  related  to  either  mathematics  achievement  in 
general  (Hindman,  Skibbe,  Miller,  &  Zimmerman,  2010;  Purpura, 
Hume,  Sims,  &  Lonigan,  2011;  Sonnenschein,  Thompson,  Metzger, 
&  Baker,  2013)  or  arithmetic  word-problem  solving  performance  in 
particular  (Boonen,  van  der  Schoot,  van  Wesel,  de  Vries,  &  Jolles, 
2013;  Kyttala  et  al.,  2014;  Vilenius-Tuohimaa,  Aunola,  &  Nurmi, 
2008).  Among  the  language  skills,  mathematics  language  (the  un¬ 
derstanding  of  vocabulary,  such  as  “more”  and  “less”)  was  con¬ 
sidered  to  be  particularly  important  (Purpura  &  Logan,  2015;  Toll 
&  Van  Luit,  2014).  Further  inclusion  of  these  measures  in  future 
studies  might  provide  a  more  complete  understanding  of  factors 
contributing  to  children’s  ability  to  solve  arithmetic  word  prob¬ 
lems.  The  third  potential  limitation  relates  to  the  time  points  at 
which  the  measures  were  conducted.  Again,  as  a  result  of  the  time 
constraint,  each  construct  (except  for  arithmetic  computation)  was 
measured  only  once.  If  the  constructs  were  assessed  at  multiple 
time  points  and  the  predictor  measures  could  be  conducted  at  the 
same  time  point,  then  the  interrelations  among  the  constructs  could 
be  more  clearly  articulated.  The  final  potential  limitation  concerns 
the  measures  of  outcomes.  To  economize  the  testing  time,  the 
current  study  only  assessed  children’s  general  mathematics 
achievement  rather  than  children’s  knowledge  in  different  domains 
of  mathematics  (e.g.,  shape  and  space,  measurement,  statistics). 
Therefore,  specific  relations  between  children’s  number-sentence 
construction  skills  and  arithmetic  computation  skills  with  different 
domains  of  mathematics  remain  to  be  explored.  Despite  the  in¬ 
ability  to  infer  such  relations,  the  significant  correlations  between 
the  two  arithmetic  word-problem  component  skills  and  general 
mathematics  achievement  are  themselves  important  because  they 
underscore  the  significance  of  the  two  skills  in  children  s  general 
mathematics  learning  overall. 

Educational  Implications 

The  current  findings  have  led  to  several  implications.  As  dem¬ 
onstrated  through  the  current  findings,  children  s  arithmetic  word- 


problem  solving  can  be  divided  into  two  major  component 
processes — number-sentence  construction  and  arithmetic  compu¬ 
tation.  The  two  processes  have  their  own  unique  cognitive  predic¬ 
tors,  and  they  independently  contribute  to  future  mathematics 
achievement.  This  indicates  that  the  two  processes  are  related,  yet 
distinct,  constructs,  raising  the  possibility  that  different  children 
may  fail  in  arithmetic  word-problem  solving  for  unique  reasons. 
Some  may  fail  because  of  their  inability  to  construct  the  right 
number  sentence;  others  may  fail  because  they  err  in  arithmetic 
computation.  In  either  case,  interventions  would  be  more  effective 
if  they  target  the  right  area.  The  discovery  of  the  correlates  of  the 
two  component  skills  also  informed  on  the  potential  intervention 
directions  for  each.  For  intervention  in  number-sentence  construc¬ 
tion,  educators  might  want  to  teach  students  ways  to  reduce  their 
memory  load  when  they  solve  arithmetic  word  problems.  The 
two-step  approach  suggested  by  Fuchs  et  al.  (2014),  by  which 
children  read  the  problem  twice  (once  for  identifying  the  essential 
structure  of  the  problem  and  writing  down  the  meta-equation  and 
again  for  filling  in  the  numbers  of  the  meta-equation)  before 
solving  the  problem,  is  an  excellent  way  to  reduce  children’s 
memory  load  when  solving  arithmetic  word  problems.  Other  ways 
to  diminish  working  memory  load  while  solving  word  problems 
include  the  use  of  reminders,  number  lines,  and  diagrams  (Lewis, 
1989;  Muth,  1991;  Swanson,  2014).  Conversely,  interventions  that 
target  improving  children’s  precision  in  symbolic-nonsymbolic 
mapping  may  be  particularly  effective  to  improve  children’s  arith¬ 
metic  computation  skills  (Kucian  et  al.,  2011;  Siegler  &  Ramani, 
2009).  Considering  the  significance  of  the  number-sentence  con¬ 
struction  process  in  general  mathematics  learning,  efforts  devoted 
to  enhancing  number-sentence  construction  skills  are  likely  to 
bring  about  greater  mathematics  achievement  among  students. 

Conclusions 

The  current  study  examined  children’s  arithmetic  word-problem 
solving  by  separating  the  problem-solving  process  into  the 
number-sentence  construction  process  and  the  computation  pro¬ 
cess.  The  two  processes  were  differentially  predicted  by  various 
cognitive  capacities  (number-sentence  construction  was  only  pre¬ 
dicted  by  domain-general  skills  whereas  arithmetic  computation 
was  predicted  by  numerical-magnitude  processing,  word  reading, 
and  domain-general  skills),  and  both  processes  longitudinally  pre¬ 
dicted  future  computation  and  general  mathematics  achievement 
after  controlling  for  the  autoregressive  effect  of  computation.  The 
findings  presented  here  have  provided  a  better  understanding  of 
children’s  arithmetic  word-problem  solving  processes  and  empha¬ 
sized  the  importance  of  instruction  in  number-sentence  construc¬ 
tion. 
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National  data  have  shown  for  decades  that  Black  students  experience  more  frequent  and  severe 
disciplinary  actions  that  remove  them  from  school  (e.g.,  suspension),  compared  with  their  White  peers. 
Despite  extensive  research  documenting  the  sequelae  associated  with  suspension  (e.g.,  school  drop-out 
and  delinquency),  there  has  been  relatively  scant  research  addressing  the  discipline  gap  as  it  relates  to 
students’  sense  of  belonging  and  equitable  treatment  at  school,  or  to  potential  adjustment  problems  it  may 
evoke.  The  present  observational  study  examined  the  Black- White  discipline  gap  in  58  high  schools  with 
a  sample  of  19,726  adolescents  (Black  n  =  7,064,  White  n  =  12,622)  in  Maryland.  Employing  a 
multilevel  framework  and  leveraging  data  from  the  U.S.  Department  of  Education’s  Office  of  Civil 
Rights  and  the  student-report  Maryland  Safe  and  Supportive  Schools  (MDS3)  School  Climate  Survey,  we 
characterized  58  high  schools  by  their  excess  in  Black  relative  to  White  student  risk  of  out-of-school 
suspension.  We  then  assessed  whether  Black  students’  excess  risk  of  out-of-school  suspension  was 
negatively  associated  with  perceived  school  equity  and  school  belonging,  and  positively  associated  with 
adjustment  problems  (i.e.,  externalizing  symptoms)  in  a  stratified  analysis  of  White  and  Black  students. 
We  found  that  school-level  discipline  gaps  were  associated  with  Black  students’  perceptions  of  less 
school  equity  (y  =  —.54,  p  <  .001),  less  school  belonging  (y  =  —.50,  p  <  .001),  and  increased 
adjustment  problems  (y  =  .77,  p  <  .001),  even  when  accounting  for  student  demographics  (i.e.,  gender, 
grade  level,  socioeconomic  status)  and  school-level  contextual  factors  (i.e.,  socioeconomic  status,  student 
diversity,  overall  suspension  rates),  whereas  these  associations  were  not  significant  for  White  students. 
Study  findings  have  implications  for  educational  reform  in  high  schools  in  which  out-of-school  suspen¬ 
sion  practices  differ  by  race. 

Keywords:  discipline  gap,  discipline  disproportionality,  race,  peer  relations,  adjustment 
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How  high  school  students  feel  about  the  climate  of  their  schools 
impacts  their  achievement  and  success.  This  study  suggests  that  Black 
students  feel  differently  about  how  equitable  and  inclusive  their 
schools  are,  depending  on  the  extent  to  which  Black  students  are 
disproportionately  suspended  at  their  schools.  In  schools  that  differ¬ 
entially  suspended  Black  students,  Black  students  reported  less  school 
belonging  and  equitable  treatment,  and  more  adjustment  problems, 
relative  to  Black  students  in  schools  with  lesser  discipline  disparities. 
These  patterns  of  association  were  not  found  for  White  students. 
Study  findings  suggest  that,  in  addition  to  alternatives  to  suspension 
and  equity  focused  interventions  to  eliminate  the  gap,  more  immediate 
social,  emotional,  and  psychological  supports  for  Black  youth  in 
schools  with  highly  differential  discipline  practices  may  be  needed. 
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Studies  examining  demographic  correlates  of  school  disci¬ 
pline  exposure  have  found  overwhelmingly  that  Black  students 
are  disciplined  at  higher  rates  than  White  students.  This  trend 
has  persisted  across  a  range  of  different  types  of  school  sanc¬ 
tions  (Gregory,  Skiba,  &  Noguera,  2010),  including  office 
discipline  referrals,  suspensions,  and  expulsions  (Krezmien, 
Leone,  &  Achilles,  2006;  Losen,  Hodson,  Keith,  Morrison,  & 
Belway,  2015;  Porowski,  O’Conner,  &  Passa,  2014;  Skiba  et 
al.,  2011;  Smith  &  Harper,  2015;  Vincent,  Swain-Bradway, 
Tobin,  &  May,  2011;  Wallace,  Goodkind,  Wallace,  &  Bach¬ 
man,  2008).  Research  examining  the  underlying  dynamics  of 
racial  discrepancies  in  school  discipline  has  uncovered  a  pattern 
of  differential  treatment,  in  which  Black  students  tended  to  be 
overrepresented  in  referrals  for  defiance  and  other  subjective 
offenses,  whereas  White  students  were  more  often  disciplined 
for  objective  infractions  (e.g.,  smoking  on  school  grounds, 
fighting;  Gregory  &  Weinstein,  2008;  Skiba,  Michael,  Nardo,  & 
Peterson,  2002).  Gender  is  also  a  factor  impacting  discipline 
disparities,  as  Black  males  are  two  times  as  likely  as  Black 
females  to  be  expelled  (KewalRamani  et  al.,  2007)  and  six 
times  as  likely  as  White  females  to  be  suspended  (Gregory 
1997). 
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Known  as  the  discipline  gap,  the  excessive  suspension  of  Black 
students  in  U.S.  schools  is  deeply  concerning  in  light  of  evidence 
suggesting  its  direct  deleterious  effects  for  students  who  have  been 
suspended,  including  increased  risk  of  subsequent  contact  with  the 
juvenile  justice  system  (Fabelo  et  al.,  2011)  and  school  drop-out 
(Bradshaw,  O’Brennan,  &  McNeely,  2008).  Yet  few  studies  have 
examined  the  ecological  implications  of  the  discipline  gap  at  the 
school-level.  Specifically,  it  is  plausible  that  Black  students  in 
schools  with  larger  Black— White  discipline  gaps  could  perceive 
that  they  and  their  Black  peers  are  not  afforded  the  same  fair  and 
inclusive  treatment  as  the  White  students  at  their  school;  this  in 
turn  could  have  an  impact  on  how  welcome  Black  students  feel  at 
the  school,  and  their  sense  of  belonging  there  (Debnam,  Johnson, 
Waasdorp,  &  Bradshaw,  2014).  Social-cognitive  theory  on  recip¬ 
rocal  determinism  highlights  the  school  social  environment  as  a 
developmental  influence  on  student  adjustment  (Bandura,  1989; 
Eccles  &  Roeser,  2011),  and  thus  suggests  that  Black  students’ 
interface  with  discipline  disparities  may  further  impact  their  views 
of  themselves  as  prosocial  and  well-adjusted  members  of  the 
school.  To  assess  these  potential  relationships,  the  current  study 
explored  school-level  discipline  disparities  in  relation  to  Black  and 
White  students’  perceptions  of  school  belonging  and  equity,  as 
well  as  the  degree  to  which  they  report  experiencing  psychological 
adjustment  problems.  This  line  of  research  will  inform  our  under¬ 
standing  of  how  students’  perceptions  of  equity  and  school  be¬ 
longing  may  vary  systematically  by  their  high  schools’  degree  of 
school-level  discipline  disproportionality,  which  in  turn  may  be 
related  to  differences  in  student  adjustment  problems.  These  find¬ 
ings  also  may  inform  our  understanding  of  intervention  and  reform 
efforts  in  high  schools  targeting  racial  differences  in  their  use  of 
out-of-school  suspension. 

Potential  Contributors  to  the  Discipline  Gap 

The  discipline  gap  has  sharply  increased  in  the  four  decades 
following  the  landmark  national  report  that  first  called  attention  to 
racial  disparities  in  school  disciplinary  outcomes  (Children’s  De¬ 
fense  Fund,  1975).  Since  the  1972-73  school  year,  the  national  rate 
of  out-of-school  suspension  for  Black  youth  has  increased  by 
nearly  200%  (from  12%  to  23%  in  2011-12),  whereas  for  White 
students,  the  rate  only  grew  by  12%  (from  6%  to  7%  in  201 1-12; 
Losen  et  al.,  2015).  It  could  be  argued  that  this  striking  increase  in 
school  discipline  rates  among  Black  youth  is  attributable  to  an 
increasing  frequency  or  severity  of  misconduct;  however,  empiri¬ 
cal  research  casts  doubt  on  such  assertions.  Specifically,  research 
exploring  potential  causes  of  the  discipline  gap  has  found  higher 
rates  of  sanctioning  Black  youth,  even  when  levels  of  misbehavior 
were  similar  to  their  White  peers  (Finn  &  Servoss,  2013;  Skiba  et 
al,  2002;  Toldson,  &  Lemmons,  2013).  Higher  rates  of  disciplin¬ 
ing  Black  youth  also  persisted  even  when  statistically  controlling 
for  teacher  ratings  of  behavior  (Bradshaw,  Mitchell,  O’ Brennan,  & 
Leaf,  2010)  and  other  potential  confounders  such  as  poverty  level 
(Skiba  et  al.,  2011)  and  socioeconomic  status  (Wallace  et  al., 
2008).  Thus,  the  argument  that  disproportionate  discipline  prac¬ 
tices  reflect  Black  students’  elevated  misconduct  alone  has  not 
been  substantiated  (see  Skiba  &  Williams,  2014  for  a  review  of 
this  debate). 

A  number  of  researchers  have  posited  that  institutional  and 
individual  biases  within  the  school  social  context  are  likely  also  a 


culprit  in  school  discipline  practices  that  disproportionately  punish 
Black  youth  (Gregory  &  Weinstein,  2008;  Skiba  et  al.,  2011; 
Vavrus  &  Cole,  2002;  Wald,  2014).  Sociopolitical  influences 
during  the  1980s  and  1990s  would  bolster  this  supposition.  Spe¬ 
cifically,  during  this  same  time  period,  a  fear  of  Black  youth  as 
“super-predators”  (Dilulio,  1995,  p.  23)  was  spreading  in  main¬ 
stream  media,  and  support  for  three  strikes  policies  and  the  war  on 
drugs  contributed  to  the  disproportionate  incarceration  of  Blacks 
and  Latinos  (Alexander,  2010;  Gilliam  &  Iyengar,  2005;  Neal  & 
Rick,  2014).  It  is  plausible  that  similar  trends  during  that  time 
within  education,  such  as  the  increased  use  of  zero  tolerance 
policies,  contributed  to  the  surge  in  out-of-school  suspensions 
affecting  Black  youth  (American  Psychological  Association  Zero 
Tolerance  Task  Force,  2008). 

The  causes  of  widespread  and  increasing  racial  discipline  dis¬ 
parities  are  complex.  Myriad,  interconnected,  and  reciprocal  path¬ 
ways  are  likely  involved.  For  example,  research  on  stress  and  the 
expression  of  implicit  bias  (Kang,  Gray,  &  Dovidio,  2014)  sug¬ 
gests  that  teacher  and  administrator  stress  may  interact  with  biases 
to  contribute  to  differential  responses  in  disciplinary  interactions 
(McIntosh,  Girvan,  Homer,  &  Smolkowski,  2014).  Other  research 
has  provided  evidence  of  a  bidirectional  influence  of  racial  bias 
and  disparities  in  student  functioning  on  one  another  as  part  of  a 
mutually  reinforcing  cycle  (Gregory,  Skiba,  &  Noguera,  2010; 
Shirley  &  Cornell,  2012;  Skiba  et  al.,  2002). 

Understanding  the  Meaning  of  School-Level 
Discipline  Disparities 

Regardless  of  whether  disparate  disciplinary  outcomes  are 
caused  by  school  staff  racial  bias,  Black  students  may  perceive 
racial  differences  in  discipline  rates  within  their  school  as  unfair, 
which  in  turn  may  have  detrimental  effects.  Qualitative  research 
examining  students’  perceptions  of  the  discipline  gap  suggest  that 
differential  discipline  practices  are  very  apparent  to  them  and 
perceived  as  unjust  (Sheets,  1996).  For  example,  a  participant  in  a 
qualitative  study  (Howard,  2008)  on  this  issue  commented: 

I  watch  it  all  the  time.  One  of  us  [Black  males]  do  something,  and  we 
get  suspended  or  expelled.  A  White  kid  does  the  exact  same  thing,  and 
he  gets  a  warning,  or  an  after  school  referral.  Sometimes  it’s  so 
obvious  that  they  treat  us  different  than  them.  (p.  971) 

Black  students  in  particular  report  sensitivity  to  teacher  interac¬ 
tions  and  disciplinary  actions  that  are  inequitable  (Ruck  &  Wort- 
ley,  2002).  Other  research  has  found  that  discipline  disparities  are 
inversely  associated  with  student  perceptions  of  positive  racial 
climate  (Mattison  &  Aber,  2007)  and  Black  students’  willingness 
to  seek  help  from  teachers  (Shirley  &  Cornell,  2012). 

Taken  together,  the  literature  suggests  the  likelihood  that  schools 
with  highly  differential  patterns  of  suspension  by  race  may  be  per¬ 
ceived  by  students  as  unfair  and  less  inclusive  environments,  partic¬ 
ularly  by  Black  students,  whereas  schools  with  less  clearly  racialized 
patterns  in  discipline  may  be  perceived  as  more  equitable.  In  turn,  the 
degree  to  which  the  school  environment  is  perceived  as  fair  and 
inclusive,  or  equitable  (Organization  for  Economic  Cooperation  and 
Development  (OECD),  2008),  has  been  associated  with  students’ 
sense  belonging  to  the  school  (Debnam  et  al.,  2014).  Yet  it  is  impor¬ 
tant  to  recognize  that  discipline  disparities  within  a  school  may  have 
different  implications  for  Black  and  White  youth.  Although  both 
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Black  and  White  student  groups  may  witness  the  more  frequent 
punitive  treatment  of  Black  students,  that  observation  may  be  expe¬ 
rienced  more  personally  by  Black  students  than  White  students.  For 
example,  perceptions  of  differential  treatment  have  been  linked  to  a 
host  of  detrimental  developmental  outcomes  among  youth  of  color, 
including  increases  in  problem  behavior  (Bogart  et  al.,  2013)  and 
anger  and  depression  (Wong,  Eccles,  &  Sameroff,  2003),  and  declines 
in  student  engagement  (Bingham  &  Okagaki,  2012;  Dotterer, 
McHale,  &  Crouter,  2009).  Thus,  it  is  plausible  that  exposure  to 
school  contexts  in  which  Black  students  are  suspended  at  higher  rates 
than  White  students  may  be  negatively  associated  with  perceived 
school  equity  and  school  belonging,  and  positively  associated  with 
adjustment  problems  for  Black  youth,  whereas  we  might  expect  little 
to  no  influence  on  White  youths’  perceptions  and  outcomes.  How¬ 
ever,  no  research  to  date  has  empirically  examined  discipline  dispar¬ 
ities  as  a  school  contextual  factor  associated  with  Black  and  White 
students’  perceptions  of  themselves  or  their  school.  The  present  study 
explored  the  contextual  influence  within  the  school  social  environ¬ 
ment  as  a  potential  pathway  by  which  the  discipline  may  have 
deleterious  effects  for  Black  youth. 

The  Present  Study 

The  purpose  of  this  study  was  to  inform  our  understanding  of  the 
discipline  gap  as  a  contextual  factor  linked  to  students’  perceived 
school  equity,  school  belonging,  and  adjustment  problems.  We  em¬ 
ployed  a  multilevel  latent  variable  approach  utilizing  counts  of  stu¬ 
dents  who  received  one  or  more  out-of-school  suspensions  in  the 
2011-12  school  year,  available  disaggregated  by  race  from  the  Civil 
Rights  Data  Collection  of  the  U.S.  Department  of  Education,  Office 
of  Civil  Rights  at  the  school  level.  At  the  student  level,  we  employed 
student  report  data  from  the  Maryland  Safe  and  Supportive  Schools 
(MDS3)  School  Climate  Survey  from  the  subsequent  year  (2012-13). 
We  conducted  these  analyses  drawing  upon  a  sample  of  19,726  Black 
and  White  students  in  58  Maryland  high  schools.  We  hypothesized 
that  school-level  disparity  in  out-of-school  suspensions,  operational¬ 
ized  as  excess  risk  of  out-of-school  suspension  relative  to  White 
students,  would  be  inversely  associated  with  Black  students’  percep¬ 
tions  of  school  equity  and  belonging  (Debnam  et  al.,  2014;  Dotterer  et 
al.,  2009),  and  positively  associated  with  self-reported  adjustment 
problems  (i.e.,  externalizing  symptoms  such  as  getting  mad  easily; 
Bogart  et  al.,  2013),  whereas  we  expected  no  significant  associations 
in  the  White  sample.  We  included  student-level  demographics  (gen¬ 
der,  grade  level,  and  maternal  education  as  a  proxy  for  socioeconomic 
status)  and  school-level  factors  (percentage  enrollment  eligible  for 
free  and  reduced  price  meals,  overall  suspension  rate,  and  diversity  of 
student  enrollment)  as  covariates  to  reduce  the  risk  of  confounding  in 
the  analysis.  This  research  has  potential  to  inform  policy  efforts  and 
programmatic  targets  to  mitigate  the  effects  of  school  discipline 
disproportionality. 

Method 

Participants 

The  sample  included  19,726  students  (Black  n  =  7,064  and 
White  n  =  12,662)  in  58  suburban  and  rural  Maryland  public  high 
schools.  All  students  reporting  essential  demographics  (e.g.,  race) 
in  the  MDS3  school  climate  survey  and  all  schools  for  which  we 


had  available  student  survey  data  were  included.  We  utilized 
out-of-school  suspension  data  from  the  year  prior  (201 1-12)  to  the 
year  student  self-report  data  were  collected  (2012-13).  The  sample 
of  Black  students  was  49.8%  female,  had  a  mean  age  15.9,  and 
39.6%  reported  their  mother  had  graduated  from  college;  similarly, 
the  White  student  sample  was  50.0%  female,  had  a  mean  age  15.9, 
and  44.1%  reported  their  mother  had  graduated  from  college.  The 
58  Maryland  high  schools  had  an  average  of  37.5%  low-income 
students  and  a  mean  out-of-school  suspension  rate  of  17.2%. 
School  enrollment  averaged  1,262.9  (SD  =  41 1.8).  From  the  Civil 
Rights  Data  Collection,  the  average  excess  in  Black  suspension 
risk  (possible  range  —1.00  to  1.00)  was  .11  (SD  =  .09).  An 
average  of  122  Black  students  per  school  and  218  White  students 
per  school  provided  data  for  the  study.  Additional  demographic 
characteristics  are  presented  in  Table  1. 

Procedure 

We  conducted  a  secondary  analysis  of  school  climate  survey 
data  from  the  MDS3  school  group  randomized  controlled  trial. 
Half  of  the  trial  schools  were  randomly  assigned  to  receive  training 
in  positive  behavior  supports  (i.e.,  coaching  on  implementation  of 
multitiered  behavior  support  strategies  and  interventions),  and  the 
other  half  were  randomized  to  a  “business-as-usual”  condition, 
meaning  that  they  received  no  additional  supports.  For  our  sec¬ 
ondary  analysis,  both  conditions  were  included  (condition  was 
controlled  in  the  analysis).  For  the  MDS3  trial,  12  of  the  state’s  24 
districts  were  approached  by  the  Maryland  State  Department  of 
Education  (MSDE)  based  on  perceived  need.  High  schools  were 
then  invited  to  participate  in  the  MDS3  project  on  a  voluntary 
basis.  Anonymous  data  were  collected  via  a  waiver  of  active 
parental  consent  and  a  youth  assent  process.  All  student  participa¬ 
tion  was  voluntary.  The  MDS3  School  Climate  Survey  (Bradshaw, 
Waasdorp,  Debnam,  &  Johnson,  2014)  was  administered  online  in 
language  arts  classrooms  to  approximately  25  classrooms  per 
school,  with  an  approximate  distribution  as  follows:  7  ninth  grade 
classrooms  and  six  each  of  tenth,  eleventh,  and  twelfth  grade 
classrooms.  School  staff  administered  the  survey  following  a  writ¬ 
ten  protocol.  The  researchers’  Institutional  Review  Board  ap¬ 
proved  analysis  of  these  data.  For  additional  information  on  the 
project,  see  Bradshaw  et  al.  (2014). 

Measures 

Student  (Level  1).  The  source  of  student-report  data  for  the 
current  study  was  the  MDS3  School  Climate  Survey,  which  was 
developed  by  a  collaborative  led  by  the  Johns  Hopkins  Center  for 
Youth  Violence  Prevention;  for  additional  details,  see  Bradshaw  et 
al.  (2014).  Cronbach’s  alphas  (a)  were  calculated  and  reported 
below  in  the  current  sample  to  assess  the  internal  consistency 
reliability  of  key  constructs  in  the  study. 

Equity.  The  equity  scale  (a  =  .82;  Bradshaw  et  al.,  2014; 
Debnam  et  al.,  2014;  Haynes,  Emmons,  &  Ben-Avie,  2001)  was 
utilized  to  assess  students  perceptions  of  school  equity  and  cul¬ 
tural  inclusion.  Three  items  assessed  students’  perceptions  of  eq¬ 
uitable  treatment  based  on  race,  gender,  and  socioeconomic  status 
(e.g.,  At  this  school,  students  of  all  races  are  treated  the  same”), 
and  a  fourth  item  assessed  cultural  inclusiveness  (i.e.,  “The  school 
provides  instructional  materials  that  reflect  my  culture”).  The  four 
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Table  1 


Student  and  School  Characteristics 


Student  characteristics 

Black  students 
in  =  7,064) 
n  (%) 

White  students 
{n  =  12,662) 
n  (%) 

Maternal  education 

Did  not  graduate  from  high  school 

641  (9.1) 

995  (7.9) 

Graduated  from  high  school 

1,919(27.2) 

3,698  (29.2) 

Attended  some  college 

1,706  (24.2) 

2,388  (18.9) 

Graduated  from  college 

2,798  (39.6) 

5,581  (44.1) 

Grade  level 

9th  grade 

2,100  (29.7) 

3,566  (28.2) 

10th  grade 

1,693  (23.9) 

3,240  (25.6) 

1 1th  grade 

1,646  (23.3) 

3,104  (24.5) 

12th  grade 

1,625  (23.0) 

2,752  (21.7) 

Agea 

15.9(1.3) 

15.9(1.2) 

Gender 

Male 

3,547  (50.2) 

6,337  (50.0) 

Female 

3,517  (49.8) 

6,325  (50.0) 

School  characteristics  ( J  =  58  schools) 

M  (SD) 

School  size  (M) 

1,262.9  (462.9) 

Free  and  reduced  price  meals  (%) 

37.5  (17.8) 

Student  diversity  (M) 

.57  (.20) 

Suspension  rate  (%) 

17.2(12.1) 

Suspension  risk  excess  (Black  risk  minus  White  risk) 

.11  (.1) 

Age  represents  mean  with  standard  deviation  in  parentheses. 


item  response  options  were  on  a  4-point  Likert  scale  from  disagree 
strongly  (1)  to  agree  strongly  (4),  with  higher  scores  indicating 
higher  levels  of  perceived  equity. 

School  belonging.  Three  items  from  the  survey  were  utilized 
to  assess  students’  sense  of  belonging  (i.e.,  “At  this  school  .  .  .” 
[stem]:  “I  feel  like  I  belong,”  “I  feel  close  to  people,”  and  “I  feel 
like  I  am  part  of  this  school”;  a  =  .81).  The  items  were  adapted 
from  the  California  Healthy  Kids  Survey  (Hanson  &  Kim,  2007; 
also  see  Resnick  et  al.,  1997)  and  response  options  were  on  a 
4-point  Likert  scale  from  disagree  strongly  (1)  to  agree  strongly 
(4),  with  higher  scores  indicating  higher  levels  of  belonging. 

Adjustment  problems  The  adjustment  problems  scale  in¬ 
cluded  four  items  that  measured  the  frequency  of  students’  exter¬ 
nalizing  symptoms  (i.e.,  “I  have  trouble  controlling  my  temper,”  “I 
have  threatened  to  hit  or  hurt  someone,”  “I  do  things  without 
thinking,”  and  “I  get  mad  easily;”  a  =  .80;  Bradshaw  et  al.,  2014) 
on  a  4-point  Likert  scale  from  never  (1)  to  almost  always  (4),  with 
higher  scores  indicating  higher  levels  of  adjustment  problems. 
These  items  were  adapted  from  the  BASC-2  externalizing  scale 
(Reynolds  &  Kamphaus,  2004). 

Demographics.  Students  also  responded  to  a  series  of  ques¬ 
tions  regarding  demographic  characteristics,  including  grade-level, 
gender,  and  socioeconomic  status  (SES).  Grade  levels  were  ninth, 
tenth,  eleventh,  or  twelfth.  Gender  was  coded  2  -  male,  1  = 
female.  SES  (i.e.,  maternal  education  level),  was  on  a  scale  from 
(a)  did  not  graduate  high  school;  (b)  graduated  from  high  school, 
(c)  attended  some  college',  (d)  graduated  from  college,  with  a 
higher  score  signifying  more  education  and  thus  higher  SES. 

School  (Level  2).  School-level  demographic  data  for  the 
2012-13  school  year  were  obtained  from  the  MSDE,  with  the 
exception  of  suspension  data  disaggregated  by  race  and  ethnicity. 
Out-of-school  suspension  data  disaggregated  by  race  and  ethnicity 


from  the  prior  school  year  (201 1-12)  were  obtained  from  the  Civil 
Rights  Data  Collection  (CRDC;  U.S.  Department  of  Education, 
Office  of  Civil  Rights,  2013).  Anonymous,  cross-sectional  student 
report  data  for  this  study  were  collected  online  as  part  of  the  MDS3 
initiative  in  spring  2013. 

Racial  gap  in  out-of-school  suspension.  School  discipline 
data  from  the  Office  of  Civil  Rights  CRDC  were  available  disag¬ 
gregated  by  race  and  gender  on  the  number  of  students  who 
received  one  suspension,  more  than  one  suspension,  and  total 
enrolled  for  the  2011-12  school  year  for  each  of  the  58  schools 
included  in  the  study.  For  our  purposes,  these  count  data  were 
aggregated  to  create  counts  of  students  reflecting  both  genders 
(male  and  female  combined)  and  who  received  one  or  more 
suspensions.  This  allowed  us  to  calculate  a  measure  of  excess  risk 
(calculated  as  Black  students’  risk  of  out-of-school  suspension 
minus  White  students’  risk  of  out-of-school  suspension).  Research 
suggests  that  White  students  can  serve  as  an  acceptable  index 
group  for  calculating  risk  (Skiba,  Poloni-Staudinger,  Gallini,  Sim¬ 
mons,  &  Feggins-Azziz,  2006).  Risk  of  suspension  for  Black 
students  was  calculated  as  the  number  of  Black  students  suspended 
(Bs)  within  each  school  divided  by  the  total  number  of  Black 
students  (TB)  enrolled  in  the  school  (Bs/Tb).  Risk  of  suspension 
for  White  students  was  calculated  as  the  number  of  White  students 
suspended  (Ws)  within  each  school  divided  by  the  total  number  of 
White  students  (Tw)  enrolled  in  the  school  (Ws/Tw).  Then,  excess 
Black  risk  of  out-of-school  suspension  was  calculated  subtracting 
the  risk  of  suspension  among  White  students  from  the  risk  of 
suspension  among  Black  students  [(Bs/Tb)  —  (Ws/Tw)].  Based  on 
this  calculation,  the  average  excess  risk  of  suspension  among 
Black  students  was  .11  (SD  =  .09).  The  range  for  the  Black- White 
suspension  risk  excess  was  —.14  to  +.31;  only  four  schools  had 
excess  risk  less  than  0  (meaning  that  in  four  schools,  the  risk  of 
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out-of-school  suspension  was  higher  for  White  students  than  Black 
students).  In  eight  schools,  enrollment  disaggregated  by  race  was 
not  available  on  the  CRDC  web  site.  For  these  schools,  enrollment 
by  race  was  obtained  from  MSDE.  Overall  suspension  rates  were 
also  obtained  from  MSDE. 

School  SES.  Data  from  MSDE  were  used  to  measure  school 
SES,  which  was  based  upon  the  percentage  of  students  receiving 
free  or  reduced  price  meals  (FARMs).  A  higher  FARMs  rate 
indicated  a  higher  concentration  of  students  from  low-SES  back¬ 
grounds.  Receipt  of  FARMs  has  been  shown  to  be  valid  indicator 
of  low  household  income  (Ensminger  et  al.,  2000). 

Student  diversity.  Racial/ethnic  diversity  of  each  school  was 
characterized  using  a  normalized  generalized  variance  (NGV)  sta¬ 
tistic  (Budescu  &  Budescu,  2012;  Simpson,  1949),  which  can  be 
interpreted  as  the  probability  of  randomly  selecting  two  individu¬ 
als  from  a  given  population  that  belong  to  different  subgroups 
(Budescu  &  Budescu,  2012),  wherein  the  higher  the  value,  the 
higher  the  diversity  of  the  population.  The  statistic  was  standard¬ 
ized  (“normalized”)  to  create  a  relative  measure  of  diversity  al¬ 
lowing  for  direct  comparisons  across  groups  (bounded  ratio  0  < 
GV  <1).  Groups  included  in  the  calculation  of  the  statistic  for  all 
schools  were  school-level  percentage  Black,  White,  Latino,  Amer¬ 
ican  Indian/ Alaska  Native,  Asian,  and  multiethnic/multiracial,  as 
reported  in  concurrent  school  enrollment  records  from  the  Mary¬ 
land  State  Department  of  Education. 

Condition.  The  data  for  the  present  study  were  collected  early 
in  the  implementation  of  the  multiyear  initiative  (Year  2),  and  thus 
intervention  effects  were  not  expected.  Furthermore,  our  analyses 
did  not  reveal  intervention  effects  on  the  study  variables.  Never¬ 
theless,  a  school-level  intervention  condition  variable  was  included 
as  a  control  variable  in  the  models. 

Overview  of  Analyses 

Missing  data.  After  dropping  students  with  missing  demo¬ 
graphic  data  (e.g.,  race),  descriptive  analyses  found  very  little 
missing  data  in  the  student  outcomes  (<1%  of  students  failed  to 
report  on  one  or  more  of  the  items  from  the  demographic  or 
outcome  measures).  As  a  result,  the  reason  for  missingness  was 
judged  to  be  random  after  adjusting  for  observed  covariates  (i.e., 
student  level  demographics  were  included  in  the  model;  Rubin, 
1976),  and  data  were  assumed  to  be  missing  at  random  (MAR; 
Arbuckle  &  Wothke,  1999).  Weighted  least  squares  estimation 
was  used  in  the  analysis,  which  requires  somewhat  more  restrictive 
assumptions  than  with  the  maximum-likelihood  estimator  (i.e., 
when  missingness  is  only  correlated  with  the  exogenous  variables), 
but  yields  consistent  estimates  when  these  conditions  are  met 
(Asparouhov  &  Muthen,  2010). 

Measurement  invariance.  We  examined  measurement  in¬ 
variance  in  the  factor  structure  of  a  three-factor  model  of  perceived 
equity,  school  belonging,  and  adjustment  problems  between  the 
Black  and  White  student  groups  through  a  series  of  configural, 
metric,  and  scalar  models  (Meredith,  1993),  fit  through  multiple 
group  CFA  in  M plus  with  WLSMV  estimation.  To  test  metric 
invariance,  we  constrained  factor  loadings  to  be  equal  across 
groups.  Scale  factors  were  fixed  at  one  in  one  group  and  free  in  the 
other  group.  Factor  variances  were  free  to  vary  across  groups,  and 
factor  means  were  fixed  at  zero  in  one  group  and  free  in  the  other 
group.  To  test  scalar  invariance,  we  constrained  factor  loadings 


and  thresholds  to  be  equal  across  groups.  Scale  factors  were  fixed 
at  one  in  one  group  and  free  in  the  other  group,  and  factor  means 
were  fixed  at  zero  in  one  group  and  free  in  the  other  group.  Factor 
variances  were  free  to  vary  across  groups.  Following  guidelines 
given  by  Cheung  and  Rensvold  (2002),  measurement  invariance 
was  found,  with  the  multigroup  model  demonstrating  adequate  fit 
to  the  data  and  the  difference  in  CFI  between  models  at  less  than 
.01.  Specifically,  when  comparing  metric  against  configural  mod¬ 
els,  x2  =  91.90  ( df=  8),  p  <  .001,  ACFI  =  .000,  ATLI  =  .001, 
and  ARMSEA  =  -.002.  When  comparing  scalar  against  config¬ 
ural  models,  X2  =  447.47  ( df  =  27),  p  <  .001,  ACFI  =  -.000, 
ATLI  =  .001,  and  ARMSEA  =  -.002.  When  comparing  scalar 
against  the  constrained  metric  model,  \2  =  414.59  {df  =  19),  p  < 
.001,  ACFI  =  -.001,  ATLI  =  .000,  and  ARMSEA  =  .000.  These 
findings  supported  the  assumption  of  measurement  invariance  by 
race. 

Collinearity.  Correlations  of  the  school-level  predictors  and 
student-level  outcomes  are  shown  in  Table  2.  To  detect  possible 
collinearity  among  the  independent  between  variables,  diagnostics 
were  conducted  to  assess  variance  inflation  factors  (VIF)  across  all 
four  predictors.  The  resulting  VIF  statistics  were  all  low,  ranging 
between  1.5  and  1.9,  suggesting  the  magnitude  of  multicollinearity 
in  this  analysis  was  low  (Kutner,  Nachtsheim,  &  Neter,  2004). 

Multilevel  analyses.  To  examine  our  central  research  ques¬ 
tions,  we  estimated  two-level  models  using  M plus  7.11.  To  estab¬ 
lish  the  rationale  for  our  use  of  multilevel  modeling,  we  calculated 
design  effects,  which  are  influenced  by  both  the  intraclass  corre¬ 
lation  (ICC)  and  the  number  of  students  per  school.  Design  effects 


Table  2 

Correlations  of  the  Latent  Within  and  Observed 
Between  Variables 


Within  (Latent)  —  Black  students 

1 

2 

1.  School  equity 

2.  School  belonging 

3.  Adjustment  problems 

.48**’ 

_  19*** 

-.18*” 

Within  (Latent)  —  White  students 

1 

2 

1.  School  equity 

2.  School  belonging 

3.  Adjustment  problems 

.55*** 

-.31’** 

-.28*** 

Between  (Observed) 

1 

2  3 

1.  FARMs 

2.  Suspension 

3.  Student  racial/ethnic  diversity 

4.  Racial  out-of-school  suspension  gap 

.55*** 

.27* 

—  .11  {ns) 

.40”  — 

-.21  {ns)  .30* 

Note.  To  detect  possible  multicollinearity  among  the  independent  be¬ 
tween  variables,  diagnostics  were  conducted  to  assess  the  variance  inflation 
factors  across  all  four  predictors.  The  resulting  value  inflation  factor 
statistics  were  all  low,  ranging  between  1.5  and  1.9,  suggesting  the  prob¬ 
ability  of  multicollinearity  problems  in  this  analysis  was  minimal.  N  = 
7,064  Black  students  and  12,662  White  students,  7  =  58  schools. 
FARMs  =  percent  of  students  in  the  school  receiving  free  and  reduced- 
priced  meals.  Racial/ethnic  diversity  is  the  normalized  generalized  variance 
(NGV)  statistic,  which  reflects  the  racial  and  ethnic  heterogeneity  diversity 
of  the  student  enrollment  between  .0  and  1 .0,  where  higher  proportion 
reflects  greater  diversity. 

p  <  .05.  p  <  .01.  *”  p  <  .001.  ns  =  nonsignificant. 
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estimates  greater  than  2.0  were  taken  to  indicate  clustering  was  a 
significant  factor  which  would  justify  the  use  of  a  multilevel 
approach  (Peugh,  2010).  In  multilevel  logistic  regression,  there  is 
assumed  to  be  no  error  at  Level  1;  therefore,  we  assumed  the 
categorical  outcomes  followed  a  logistic  distribution  with  a  mean 
of  0  and  a  variance  of  3.29  (Snijders  &  Bosker,  1999).  The 
resulting  design  effects  ranged  from  2.84  to  16.93,  indicating  the 
need  for  multilevel  modeling  of  all  three  outcomes  (perceived 
equity,  school  belonging,  and  adjustment  problems)  for  both  Black 
and  White  student  subgroups. 

Thereafter,  a  stepwise  approach  to  model  building  was  taken, 
such  that  the  multilevel  models  were  built  one  variable  and  one 
level  at  a  time  in  order  to  be  sensitive  to  the  stability  of  findings 
with  and  without  nonsignificant  effects  (Raudenbush  &  Bryk, 
2002).  For  all  outcome  variables,  we  fit  logistic  multilevel  models, 
treating  the  indicators  of  the  latent  outcome  variables  as  categor¬ 
ical  (ordinal),  and  employing  weighted  least  squares  estimation 
(WLSMV).  We  generated  standardized  coefficients  as  an  effect 
size  to  allow  readers  to  assess  the  strength  of  the  associations 
identified  and  their  practical  meaning  (Nieminen,  Lehtiniemi, 
Vahakangas,  Huusko,  &  Rautio,  2013).  Degree  of  model  fit  was 
gauged  by  the  chi-square  statistic  (x2),  comparative  fit  index  (CFI; 
Bentler,  1990),  non-normed  fit  index  (NNFI,  also  known  as  the 
Tucker-Lewis  Index  [TLI];  Bentler  &  Bonett,  1980),  and  the  root- 
mean-square  error  of  approximation  (RMSEA)  with  90%  confidence 
interval  (RMSEA;  Steiger  &  Lind,  1980).  Adequate  model  fit  was 
determined  by  chi-square  test  nonsignificance  >  .05,  CFI  >  .95, 
TLI  >  .95,  and  RMSEA  <  .05.  With  large  sample  sizes,  the  chi- 
square  test  is  known  to  be  sensitive  (Marsh,  Balia,  &  McDonald, 
1988).  Alternative  fit  indices  based  on  principals  of  parsimony  (i.e., 
RMSEA)  were  therefore  referenced  to  make  decisions  regarding 
competing  models  (Browne  &  Cudeck,  1992). 

We  conducted  a  stratified  analysis  with  Black  and  White 
student  samples  modeled  separately.  The  three  outcome  vari¬ 
ables  (i.e.,  equity,  belonging,  and  adjustment  problems)  were 
modeled  as  latent  variables  using  items  measured  at  the  student 
level  (Level  1).  Predictors  included  at  Level  1  were  grade-level, 
male  gender,  and  student  SES.  Continuous  Level  1  covariates 
were  group-mean  centered  to  allow  for  assessment  of  between- 
groups  differences  (Croninger,  2013).  At  Level  2,  we  included 
the  racial  gap  in  out-of-school  suspension  as  the  primary  inde¬ 
pendent  variable  of  interest.  We  also  included  overall  suspen¬ 
sion  rate,  percentage  of  students  receiving  FARMs,  and  the 
NGV  school  diversity  statistic  as  covariates  of  interest,  and  the 
MDS3  intervention  condition  as  a  control.  All  continuous  Level 
2  variables  were  grand-mean  centered.  To  examine  whether  the 
racial  gap  in  suspension  risk  was  associated  with  Black  stu¬ 
dents’  perceptions  of  equity,  belonging,  and  adjustment  prob¬ 
lems,  we  examined  the  between  model  effects  of  the  racial  gap 
on  all  three  latent  outcome  variables.  The  model  included  the 
Level  1  and  Level  2  predictor  variables’  main  effects. 

Results 

Figure  1  and  Table  3  present  the  results  for  the  multilevel 
models  examining  student  perceptions  of  school  equity  and  be¬ 
longing,  as  well  as  self-reported  adjustment  problems.  We  as¬ 
sessed  improvement  in  the  percent  between  explained  of  the  final 
model  with  all  covariates  plus  the  racial  gap  in  out-of-school 


suspension  risk  indicator  over  the  model  with  covariates  only.  We 
found  the  absolute  difference  in  the  percent  between  explained 
was,  for  Black  students,  +55.6%  for  equity,  +20.1%  for  school 
belonging,  and  +20.6%  for  adjustment  problems;  and  for  White 
students,  +0.1%  for  equity,  -1.0%  for  school  belonging,  and 
0.0%  for  adjustment  problems.  The  change  in  R2  for  the  between 
model  was,  for  Black  students,  +.28  for  equity,  +.22  for  school 
belonging,  and  +.43  for  adjustment  problems;  and  for  White 
students,  0.01  for  equity,  -0.01  for  school  belonging,  and  0.00  for 
adjustment  problems. 

Perceived  Equity 

Racial  gap  in  out-of-school  suspension.  As  hypothesized, 
the  analyses  suggested  that  there  was  a  significant  negative  asso¬ 
ciation  between  Black-White  out-of-school  suspension  gap  and 
perceived  equity  for  Black  students  only  (y  =  —.54,  p  <  .001). 
For  White  students,  a  nonsignificant  negative  association  was 
found. 

Level  1  covariates.  The  significant  finding  for  Black  students 
held  even  when  adjusting  for  level  1  demographic  co variates  (male 
gender  y  =  .07,  p  =  .010;  grade  level  and  student  SES,  all  ns).  For 
White  students,  male  gender  (y  =  .08,  p  <  .001)  and  SES  (y  = 
.08,  p  <  .001)  were  positively  associated  with  perceived  equity, 
whereas  there  was  no  significant  association  between  perceived 
equity  and  grade-level. 

Level  2  covariates.  At  the  school-level,  student  diversity  was 
the  only  covariate  of  interest  that  was  significantly  related  to  Black 
students’  perceptions  of  equity  (diversity  y  =  .56,  p  <  .001; 
FARMs  rate,  suspension  rate,  ns).  In  contrast,  both  FARMs 
(y  =  —.59,  p  <  .001)  and  suspension  rate  (y  =  —.32,  p  —  .013) 
were  significantly  negatively  associated  with  White  students’  per¬ 
ceptions  of  school  equity,  whereas  student  diversity  was  not  sig¬ 
nificantly  associated  with  White  students’  perceived  equity. 

School  Belonging 

Racial  gap  in  out-of-school  suspension.  We  found  a  statis¬ 
tically  significant  negative  association  between  schools’  Black- 
White  suspension  gap  and  Black  students’  sense  of  school  belong¬ 
ing,  as  hypothesized  (y  =  —0.50,  p  <  .001).  For  White  students, 
a  nonsignificant,  positive  association  with  sense  of  school  belong¬ 
ing  was  found. 

Level  1  covariates.  The  association  between  the  Black- White 
suspension  gap  and  Black  students’  reports  of  school  belonging 
remained  significant  even  after  accounting  for  other  demographic 
factors.  Student  SES  was  not  significantly  associated  with  school 
belonging,  but  grade  level  was  (y  =  —0.04,  p  —  .001),  indicating 
that  older  students  reported  lower  levels  of  belonging.  Male  gender 
was  also  significantly  positively  associated  (y  =  0.19,  p  <  .001), 
indicating  that  males  reported  significantly  higher  levels  of  school 
belonging  than  females.  Among  White  students,  male  gender  and 
SES  were  significantly  positively  associated  (both  y  =  0.13,  p  < 
.001),  whereas  grade-level  was  negatively  associated  (y  =  —0.04, 

p  <  .001). 

Level  2  covariates.  FARMs  was  significantly  negatively  as¬ 
sociated  with  Black  students’  (y  =  —0.41,  p  =  .028)  and  White 
students’  (y  =  —  0.60,  p  <  .001)  perceptions  of  school  belonging, 
indicating  that  both  student  groups  perceived  lower  levels  of 
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Figure  1.  Results  for  the  two-level,  confirmatory  factor  analysis  with  covariates  are  noted  B  for  the  Black 
subgroup  and  W  for  the  White  subgroup.  Error  terms  and  correlations  between  latent  variables  are  not  depicted 
(see  Table  3  for  full  results).  Black-White  suspension  gap  is  the  excess  risk  of  out-of-school  suspension  among 
Black  students  relative  to  White  students,  where  values  greater  than  zero  indicate  higher  suspension  risk  and 
values  lower  than  zero  indicate  lower  suspension  among  Black  students  relative  to  White  students.  Suspension 
rate  is  the  percentage  of  the  entire  school’s  student  enrollment  who  received  one  or  more  suspensions  in  the 
2011-12  school  year.  FARMs  is  the  percentage  of  students  receiving  free  or  reduced  price  meals.  Racial/ethnic 
diversity  is  the  normalized  generalized  variance  (NGV)  statistic,  which  reflects  the  racial  and  ethnic  heteroge¬ 
neity  diversity  of  the  student  enrollment  between  0.0  and  1.0,  where  higher  proportion  reflects  greater  diversity. 
Eq,  =  Equity  latent  variable  on  the  between-level;  Be,-  =  Belonging  latent  variable  on  the  between-level;  Ad;  = 
Adjustment  problems  latent  variable  on  the  between-level.  Eq,-  =  Equity  latent  variable  on  the  within-level; 
Be,-,-  =  Belonging  latent  variable  on  the  within-level;  Ad,;/  =  Adjustment  Problems  latent  variable  on  the 
within-level.  All  bold  coefficients  represent  statistical  significance,  ns  indicates  nonsignificant  associations. 
*  p  <  .05.  **p  <  .01.  ***p  <  .001. 


school  belonging  in  low-SES  schools.  Student  diversity  was  pos¬ 
itively  associated  with  Black  students  perceptions  of  school  be¬ 
longing  (y  =  0.46,  p  =  .01 1),  but  not  significantly  associated  with 
White  students’  perceptions.  Suspension  rates  were  not  signifi¬ 
cantly  associated  with  belonging  among  Black  students  or  White 
students. 

Adjustment  Problems 

Racial  gap  in  out-of-school  suspension.  As  hypothesized,  we 
found  that  Black  students  reported  higher  levels  of  adjustment 
problems  in  schools  with  higher  Black-White  suspension  gaps 
(y  =  .77,  p  <  .001).  In  contrast,  we  found  no  significant  associ¬ 
ation  between  Black- White  suspension  gap  and  adjustment  prob¬ 
lems  among  White  students. 

Level  1  covariates.  The  significant  finding  of  an  association 
between  the  Black- White  suspension  gap  and  adjustment  problems 
among  Black  students  held  even  when  accounting  for  male  gender 
(y  =  -.19,  p  <  .001),  grade-level  (y  =  -.08,  p  <  .001),  and 


student  SES  (y  =  -.13,  p  <  .001),  which  were  all  significantly 
inversely  associated  with  Black  students’  reports  of  adjustment 
problems.  Among  White  students,  grade-level  (y  =  -.05,  p  < 
.001)  and  student  SES  (y  =  —.20,  p  <  .001)  were  both  signifi¬ 
cantly  inversely  associated  with  adjustment  problems,  but  not  male 
gender  (ns). 

Level  2  covariates.  At  the  school-level,  none  of  the  other 
covariates  were  significantly  related  to  Black  students’  perceptions 
of  adjustment  problems  (FARMs,  suspension  rate,  and  student 
diversity;  all  ns).  For  White  students,  the  school  percentage  of 
enrollment  eligible  for  FARMs  was  significantly  positively  asso¬ 
ciated  with  reports  of  adjustment  problems  (y  =  .61,  p  <  .001), 
indicating  that  White  students  reported  higher  levels  of  adjustment 
problems  in  lower-SES  schools.  Student  diversity  was  signifi¬ 
cantly  inversely  associated  with  adjustment  problems  (y  =  -.27, 
p  =  .021),  whereas  suspension  rate  was  significantly  positively 
associated  with  White  students’  reports  of  adjustment  problems 
(y  -  .38,  p  =  .013). 
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Table  3 


Associations  of  the  Three  Factor,  Two-Level  Confirmatory  Factor  Analysis  With  Covariates 


Perceived  equity 

School  belonging 

Adjustment  problems 

y 

P 

SE 

t 

y 

P 

SE 

t 

y 

P 

SE 

t 

Black  students  (n  =  7,064) 

Student-level  variables 

Male  gender 

Grade  level 

SES 

.07 

-.01 

* 

ns 

.03 

.01 

2.58 

-.80 

.19 

-.04 

*** 

** 

.03 

.01 

6.97 

-3.20 

-.19 

-.08 

*** 

*** 

.03 

.01 

-7.06 

-7.07 

-.03 

ns 

.02 

-1.95 

.01 

ns 

.02 

.66 

-.13 

*** 

.02 

-8.10 

School-level  variables 

Black- White  suspension  gap 

-.54 

*** 

.14 

-3.72 

-.50 

*** 

.14 

-3.69 

.77 

*** 

.17 

4.44 

FARMs 

-.08 

ns 

.18 

-.46 

-.41 

* 

.19 

-2.20 

-.04 

ns 

.21 

-.20 

Suspension  rate 

-.14 

ns 

.20 

-.72 

.13 

ns 

.20 

-.67 

.34 

ns 

.22 

1.53 

Student  diversity 

.56 

*** 

.16 

3.53 

.46 

* 

.18 

2.55 

-.33 

ns 

.22 

-1.54 

Condition 

.04 

ns 

.17 

.27 

-.03 

ns 

.16 

-.16 

.13 

ns 

.18 

.70 

White  students  ( n  =  12,662) 

Student-level  variables 

Male  gender 

.08 

*** 

.02 

3.70 

.13 

*** 

.02 

6.87 

.03 

ns 

.02 

1.42 

Grade  level 

-.01 

ns 

.01 

-1.10 

-.04 

*** 

.01 

-5.82 

-.05 

*** 

.01 

-6.28 

SES 

School-level  variables 

.08 

*** 

.01 

8.14 

.13 

*** 

.01 

14.22 

-.20 

*** 

.01 

21.31 

Black- White  suspension  gap 

-.11 

ns 

.13 

-.90 

.01 

ns 

.15 

.07 

-.01 

ns 

.12 

-.12 

FARMs 

-.59 

*** 

.13 

-4.66 

-.60 

*** 

.14 

-4.18 

.61 

♦♦♦ 

.13 

4.53 

Suspension  rate 

-.32 

* 

.13 

-2.47 

-.19 

ns 

.17 

-1.08 

.38 

* 

.15 

2.48 

Student  diversity 

.01 

ns 

.12 

.08 

-.04 

ns 

.15 

-.24 

-.27 

* 

.12 

-2.30 

Condition 

-.02 

ns 

.10 

-.21 

-.03 

ns 

.13 

-.21 

.18 

ns 

.11 

1.55 

Note.  Coefficients  are  standardized.  N  =  7,064  Black  students,  12,662  White  students.  J  =  58  schools.  Model  Fit:  Black,  x2(146)  =  543.21,  p  <  .001, 
CFI  =  .99,  TLI  =  .99,  RMSEA  =  .02;  White,  x2(146)  =  1,697.75,  p  <  .001,  CFI  =  .99,  TLI  =  .99,  RMSEA  =  .03;  SES  =  socioeconomic  status; 
FARMs  =  percent  of  students  in  the  school  receiving  free  and  reduced  priced  meals.  Unadjusted  ICCs,  Black  student  subgroup  Perceived  Equity  =  .04, 
School  Belonging  =  .08,  Adjustment  Problems  =  .02;  White  student  subgroup  perceived  equity  =  .05,  school  belonging  =  .09,  adjustment  problems  = 
.03.  Design  effects  for  all  latent  variables  were  >  2.0  for  both  groups.  Absolute  difference  in  the  percent  between  explained  of  the  final  model  with  excess 
Black-White  suspension  risk  and  covariates  relative  to  model  with  covariates  only  is,  for  Black  students,  +55.6%  for  Equity,  +20.1%  for  school  belonging, 
and  +  20.6%  for  adjustment  problems,  and  for  White  students,  +.1%  for  equity,  -1.0%  for  school  belonging,  and  .0%  for  adjustment  problems. 

*  p  <  .05.  **  p  <  .01.  ***  p  <  .001.  ns  =  nonsignificant. 


Discussion 

A  large  and  growing  literature  examines  the  potentially  harmful 
effects  of  out-of-school  suspension,  which  disproportionately  re¬ 
moves  Black  youth  from  U.S.  schools  (e.g.,  Fabelo  et  al.,  2011). 
The  present  study  is,  to  our  knowledge,  the  first  quantitative 
analysis  to  explore  the  contextual  effects  of  discipline  disparities 
on  students’  perceptions  of  themselves  and  their  schools,  regard¬ 
less  of  whether  they  have  been  suspended  or  not.  It  is  novel  in  its 
use  of  multilevel  analysis  to  examine  school  discipline  disparities 
as  a  feature  of  the  school  context  negatively  associated  with 
protective  factors  such  as  perceived  equity  and  school  belonging, 
and  positively  associated  with  adjustment  problems. 

To  explore  the  degree  of  discipline  disparity  as  an  attribute  of 
each  school,  schools  were  characterized  utilizing  an  objective 
indicator  of  Black  students’  risk  of  out-of-school  suspension  rel¬ 
ative  to  White  students’  risk  in  the  201 1-12  school  year.  Of  note, 
only  four  of  the  study’s  58  schools  had  lower  Black  student  risk  of 
suspension;  in  all  other  schools,  suspension  risk  was  greater  for 
Black  students.  We  then  examined  schools’  racial  gaps  in  out-of- 
school  suspension  in  relation  to  Black  and  White  students’  per¬ 
ceived  school  equity,  school  belonging,  and  adjustment  problems 
in  the  subsequent  school  year  (2012-13).  The  study  findings  shed 
light  on  how  racial  differences  in  schools’  use  of  out-of-school 
suspension  may  be  negatively  associated  with  Black  students 
experience  of  supportive  school  climate  and,  in  turn,  positively 
associated  with  adjustment  problems.  The  findings  have  implica¬ 


tions  for  educational  practice  and  policy  to  support  Black  youth  in 
schools  marked  by  inequality  in  discipline  practices. 

Perceived  Equity 

As  hypothesized,  schools’  Black-White  suspension  gap  was 
significantly  inversely  associated  with  Black  students’  perceptions 
of  school  equity,  even  when  accounting  for  a  host  of  other  student 
and  school-level  covariates,  whereas  White  students’  perceptions 
of  school  equity  were  not  significantly  associated  with  their 
school’s  Black- White  suspension  gap.  This  finding  suggests  that 
Black  students  perceived  lower  levels  of  school  equity  (i.e.,  fair 
and  inclusive  treatment  of  students  by  schools  regardless  of  race, 
gender,  SES,  and  cultural  background)  in  schools  marked  by 
higher  levels  of  racial  disparity  in  suspension  rates.  This  interpre¬ 
tation  is  supported  by  an  extensive  literature  which  suggests  that 
youth  of  color  and  other  marginalized  groups  (e.g.,  sexual  minor¬ 
ities)  are  more  likely  to  perceive  discriminatory  treatment  in 
schools  (Benner  &  Graham,  2013;  LaFromboise,  Hoyt,  Oliver,  & 
Whitbeck,  2006;  Le  &  Stockdale,  2011;  McLaughlin,  Hatzen- 
buehler,  &  Keyes,  2010;  Tummala-Narra  &  Claudius,  2013).  Prior 
research  highlights  the  important  role  of  school  norms  of  fairness, 
inclusion,  and  respect  for  diversity  in  fostering  a  safe  and  positive 
school  climate  (Thapa,  Cohen,  Guffey,  &  Higgins-D’Alessandro, 
2013).  Our  finding  suggests  that,  in  schools  with  larger  discipline 
disparities,  Black  students  may  perceive  a  more  negative  school 
climate  than  their  White  classmates  within  the  same  schools;  this 
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in  turn,  may  have  implications  for  the  wide  range  of  emotional, 
psychological,  behavioral,  and  physical  health  outcomes  that  have 
been  linked  to  positive  school  climate  (Thapa  et  al.,  2013). 

Contrary  to  our  hypotheses,  grade-level,  student  SES,  school 
SES  (FARMs),  and  suspension  rate  were  not  significantly  associ¬ 
ated  with  perceived  equity  in  the  Black  student  sample.  These 
non-significant  results  suggest  the  salience  of  racial  disparity  in 
school  discipline  as  it  relates  to  Black  students’  perceptions  of 
school  equity,  regardless  of  student  gender,  SES,  or  grade-level — 
and  regardless  of  the  school’s  overall  suspension  rate  and  propor¬ 
tion  of  low-SES  (FARMs).  The  finding  of  a  positive  significant 
association  of  student  diversity  with  perceived  equity  among  Black 
students  in  this  model  was  particularly  surprising,  as  it  challenges 
prior  research  that  has  found  school  diversity  (i.e.,  both  heteroge¬ 
neity  diversity  and  minority  concentration)  to  be  negatively  asso¬ 
ciated  with  perceived  school  equity  (Bottiani,  Bradshaw,  &  Men- 
delson,  2016;  Debnam,  Johnson,  Waasdorp,  &  Bradshaw,  2014). 
The  negative  association  between  diversity  and  perceived  equity 
was  significantly  less  among  Black  students  relative  to  White 
students  in  Bottiani  and  colleagues’  (2016)  analysis.  Neither  of  the 
two  studies  stratified  their  analyses  by  student  race  (which  was 
done  in  this  analysis),  but  instead  looked  at  statistical  interactions. 
This  could  possibly  account  for  the  difference  in  these  findings. 
Because  we  also  found  a  significant,  positive,  school-level  corre¬ 
lation  between  the  racial  gap  in  out-of-school  suspension  (which 
could  be  viewed  as  an  indicator  of  inequity)  and  student  diversity 
(shown  in  Table  2),  further  research  exploring  how  school  diver¬ 
sity,  perceived  equity,  and  school  discipline  disparities  may  oper¬ 
ate  in  concert,  and  vary  by  students’  position  of  race  in  the  school, 
is  needed  to  replicate  and  further  understand  these  conflicting 
findings. 

School  Belonging 

We  found  in  this  study  that  greater  racial  gaps  in  out-of-school 
suspension  risk  were  associated  with  significantly  lower  levels  of 
reported  school  belonging  among  Black  students,  whereas  the 
association  was  not  significant  in  the  White  student  sample.  This 
finding  suggests  that,  when  Black  students  are  more  frequently 
removed  from  the  school  than  their  White  classmates,  it  may  send 
a  message  to  all  students  (suspended  or  not)  about  the  degree  to 
which  Black  students  are  welcome  and  accepted  in  the  school 
social  context.  Although  not  directly  related,  research  on  discrim¬ 
ination  at  school  lends  some  support  to  this  interpretation,  as  such 
experiences  have  been  linked  to  students’  interrupted  school  bond¬ 
ing  (Dotterer  &  Lowe,  2015). 

Although  we  did  not  measure  perceived  discrimination  in  this 
study,  it  may  be  worth  exploring  its  potential  to  explain  Black 
students’  lower  levels  of  school  belonging  in  schools  with  racial 
disparities  in  school  discipline  in  future  research.  Theory  suggests 
that  Black  youth  who  experience  discrimination  in  school  may 
develop  identities  in  concert  with  peers  that  are  antagonistic  to 
prosocial  norms  of  the  broader  school  culture  (Fordham  &  Ogbu, 
1986),  which  may  disrupt  their  sense  of  school  belonging.  More¬ 
over,  emerging  research  on  school  racial  climate  and  racial  identity 
processes  has  underscored  the  importance  of  person-context  fit  as 
it  relates  to  school  engagement  among  Black  youth  (Byrd  & 
Chavous,  2011).  Thus,  future  research  directly  examining  whether 
Black  youth  experience  school-level  discipline  disparities  by 


race  as  a  form  of  discrimination,  even  when  not  directly  im¬ 
pacted  (i.e.,  not  suspended)  may  be  fruitful  in  advancing  our 
understanding  of  how  the  Black-White  discipline  gap  within  a 
school  could  influence  Black  students  sense  of  school  belong¬ 
ing.  These  dynamics  likely  vary  depending  upon  the  racial  and 
ethnic  diversity  of  the  school  (Benner  &  Graham,  2013),  which 
is  why  we  accounted  for  school  diversity  in  our  analysis. 
Particularly  because  school  diversity  was  found  to  be  signifi¬ 
cantly  associated  with  Black  students’  school  perceptions  in 
this  model,  future  research  on  discipline  disparities  should 
continue  to  account  for  variation  in  school  racial/ethnic  diver¬ 
sity. 

Adjustment  Problems 

In  this  study,  we  found  that  Black  students’  reports  of  adjust¬ 
ment  problems  were  significantly  positively  associated  with 
Black- White  disparity  in  suspension  risk,  whereas  no  significant 
association  was  found  with  White  students’  reports  of  adjustment 
problems.  One  possible  explanation  for  this  finding  may  be  the 
potential  role  of  stereotype  threat  (Steele  &  Aronson,  1995).  Ste¬ 
reotype  threat  has  been  defined  as  “the  arousal,  worrying  thoughts, 
and  temporary  cognitive  deficits  evoked  in  situations  where  a 
group  member’s  performance  can  confirm  the  negative  stereotype 
about  their  group’s  ability  in  that  domain”  (Rydell,  Rydell,  & 
Boucher,  2010,  p.  885).  Racial  gaps  in  school  discipline  rates 
within  a  school  may  raise  racial  stigma  to  conscious  awareness 
among  Black  students  within  that  context  (Benner  &  Graham, 
2013).  Thus,  it  is  plausible  that  schools  with  large  enough  racial 
disparities  in  suspension  rates  to  be  apparent  to  students  may 
prompt  responses  during  disciplinary  encounters  akin  to  stereotype 
threat.  A  classroom  disciplinary  interaction  may  trigger  stereotype 
threat  more  readily  in  highly  disproportionate  schools,  which  in 
turn  could  escalate  the  disciplinary  encounter,  resulting  in  a  dis¬ 
ciplinary  sanction  issued  by  a  teacher,  whereas  the  encounter  may 
not  have  unfolded  in  this  way  in  the  absence  of  the  stereotype 
threat  being  activated.  Most  research  on  stereotype  threat  among 
Black  individuals  suggests  the  effects  are  primarily  relevant  to 
academic  performance  (Steele  &  Aronson,  1995;  Steele,  Spencer, 
&  Aronson,  2002);  however,  research  on  stereotype  activation  and 
behavior  has  generally  shown  that  people  subsequently  behave  in 
ways  consistent  with  the  stereotype  (Wheeler  &  Petty,  2001).  This 
is  also  consistent  with  the  inverse  of  the  Pygmalion  Effect — the 
Golem  Effect — which  theorizes  that  lower  expectations  lead  to 
poorer  performance  (Babad,  Inbar,  &  Rosenthal,  1982).  The  acti¬ 
vation  of  stereotype  threat  within  disproportionate  disciplinary 
contexts  is  a  potential  avenue  for  future  research  to  further  eluci¬ 
date  the  current  findings. 

The  racial  gap  in  out-of-school  suspension  was  not  associated 
with  adjustment  problems  among  White  students;  however,  it  is 
noteworthy  that  student  diversity  was  significantly  inversely  asso¬ 
ciated  in  this  subgroup,  suggesting  that  White  students  reported 
lower  levels  of  adjustment  problems  in  schools  with  a  higher 
degree  of  heterogeneity  in  the  racial/ethnic  composition  of  their 
student  enrollment.  This  finding  is  consistent  with  prior  research 
regarding  the  benefits  of  school  diversity  for  nonminority  students 
(e.g.,  Siegel-Hawley,  2012),  and  may  suggest  that  school  diversity 
could  function  to  mitigate  adjustment  problems  for  White  students. 
This  interpretation  would  be  consistent  with  research  finding  that 
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contact  with  students  of  other  races  and  ethnicities  generates  more 
knowledge  and  awareness  of  different  cultural  backgrounds,  which 
in  turn  increases  empathic  feeling  and  reduces  anxiety  (Pettigrew 
&  Tropp,  2008).  Another  plausible  reason  for  this  finding  may 
simply  be  that  White  students  feel  better  about  themselves  in  more 
diverse  contexts,  possibly  due  to  observed  differential  treatment  by 
school  staff  or  due  to  their  own  self-comparisons.  These  potential 
pathways,  which  may  explain  the  inverse  association  of  diversity 
and  adjustment  problems  among  White  students,  are  important  for 
future  research  to  examine. 

Although  exploring  potential  mediating  mechanisms  represents 
an  important  next  step  for  future  inquiry,  these  processes  are  likely 
bidirectional.  Future  studies  measuring  both  student-reported  data 
and  discipline  disproportionality  rates  at  multiple  time  points  are 
needed  to  better  establish  temporality  and  therefore  inform  our 
understanding  of  the  potential  directionality  of  these  influences. 
For  example,  research  assessing  perceived  discrimination  as  a 
potential  mediating  mechanism  between  school  discipline  dispar¬ 
ity  and  social-emotional  disparities  among  Black  youth  may  help 
to  explain  the  finding  of  higher  levels  of  reported  adjustment 
problems  by  Black  students  in  school  with  larger  Black-White 
school  discipline  gaps.  A  number  of  studies  have  reported  that 
adolescents  who  perceived  more  racial  or  ethnic  discrimination 
also  reported  more  psychological  distress,  low  self-esteem,  and 
depression  (e.g.,  Benner  &  Kim,  2009;  Brody  et  al.,  2006;  Gross- 
man  &  Liang,  2008;  Prelow,  Danoff-Burg,  Swenson,  &  Pulgiano, 
2004;  Seaton,  Caldwell,  Sellers,  &  Jackson,  2010)  and  more  dis¬ 
crimination  and  externalizing  behaviors  (Bogart  et  al.,  2013)  and 
anger  (Wong,  Eccles,  &  Sameroff,  2003). 

Limitations  and  Strengths 

There  are  some  limitations  of  this  study  that  warrant  consider¬ 
ation  when  interpreting  the  findings.  First,  the  results  should  be 
interpreted  with  caution  because  of  the  potential  that  the  degree  of 
discipline  disparity  may  have  changed  during  the  1-year  time  lapse 
between  the  collection  of  discipline  disproportionality  data  and 
student-reported  outcomes.  We  did  not  exclude  ninth  graders 
(28.9%  of  enrollment  in  this  sample),  who  would  not  have  been  in 
the  school  in  the  prior  year.  In  the  2012-13  school  year,  the  state 
of  Maryland’s  mobility  rate  was  18.4%  (with  9.3%  entrants  and 
9.1%  withdrawals),  which  suggests  there  may  have  been  additional 
students  not  present  in  the  school  from  the  year  prior.  Thus,  there 
is  a  likelihood  that  a  fair  percentage  of  the  school  enrollment 
changed  during  the  1-year  lapse.  However,  prior  research  suggests 
that  discipline  disparities  remain  relatively  stable  over  time  (Nolte- 
meyer  &  McLoughlin,  2010).  Specifically,  in  comparing  the  dis¬ 
proportionality  indices  from  2009-10,  2010-11,  and  2011-12  in 
the  U.S.  DOE  Civil  Rights  Data  Collection  for  the  state  of  Mary¬ 
land,  Porowski  et  al.  (2014)  found  that  the  Black-to- White  risk 
ratio  of  out-of-school  suspension  was  fairly  stable  at  about  2.5  to 
2.8  during  these  3  years.  Thus,  the  assumption  that  incoming 
students  experienced  a  similar  rate  of  disproportionality  as  oc¬ 
curred  in  the  year  prior  is  reasonable.  An  advantage  of  using 
disproportionality  data  from  an  earlier  time  point  is  that  it  helps  to 
demonstrate  temporality,  at  least  to  some  extent.  Although  causal 
inferences  cannot  be  drawn  from  these  analyses,  the  fact  that  the 
disproportionality  data  were  collected  prior  to  the  data  on  student 
perceptions  and  functioning  is  consistent  with  the  study’s  central 


premise  that  higher  rates  of  disparate  discipline  practices  would 
precede  Black  students’  poorer  psychological  adjustment. 

Another  potential  limitation  is  that  the  data  in  this  secondary 
analysis  come  from  a  group  randomized  controlled  trial.  This 
could  have  introduced  confounding  in  the  analysis;  however,  for 
several  reasons,  we  argue  there  is  very  little  to  no  potential  for  this 
to  have  impacted  our  findings.  Although  it  is  possible  that  the 
positive  behavior  supports  condition  schools  could  have  reduced 
their  use  of  out-of-school  suspension  between  the  2011-12  year 
and  the  year  in  which  we  collected  student  survey  data  (2012-13), 
it  is  still  unlikely  there  would  have  been  a  differential  reduction  in 
the  racial  gap  in  out-of-school  suspension  risk  in  the  condition 
schools.  Prior  research  has  found  racial  disparities  to  persist  even 
in  schools  implementing  positive  behavioral  supports  (Vincent, 
Sprague,  Pavel,  Tobin,  &  Gau,  2015).  The  likelihood  of  this  is  also 
minimal  given  the  early  implementation  stage  of  school-wide 
supports,  which  are  slow  to  be  codified  as  school  practice.  Nev¬ 
ertheless,  we  treated  condition  as  a  potential  confounder  and 
statistically  controlled  for  it  in  our  analyses;  none  of  the  associa¬ 
tions  were  significant. 

Another  limitation  is  that  we  were  not  able  to  hone  in  on  the  role 
of  gender  in  the  discipline  gap  in  the  present  analysis.  In  light  of 
the  striking  excess  of  school  punishment  of  Black  males,  we 
understand  the  role  of  gender  is  critical  to  consider  in  examinations 
discipline  disproportionality.  However,  in  this  analysis,  there  was 
not  sufficient  justification  to  exclude  Black  females  from  a  school- 
level  characterization  of  the  discipline  gap,  as  this  subpopulation  is 
also  affected  by  discipline  disproportionality.  Black  females  have 
higher  suspension  rates  than  White  males  at  both  elementary  and 
secondary  school  levels  (Losen  et  al.,  2015).  Although  beyond  the 
scope  of  the  present  study,  future  studies  may  seek  to  run  multiple 
models  with  gender-adjusted  risk  disparities  to  explore  how  the 
patterns  of  findings  may  vary  when  incorporating  race  by  gender 
specific  ratios.  For  example,  perhaps  higher  Black  female  to  White 
male  risk  indicators  are  significantly  associated  with  differential 
levels  of  adjustment  and  perceptions  of  equity  and  school  belong¬ 
ing  in  these  same  comparison  groups.  Achieving  this  level  of 
specificity  in  the  pattern  of  findings  would  further  bolster  the 
argument  that  discipline  disparities  are  discernible  to  students  and 
affect  their  views  of  the  school  and  themselves. 

One  last  caution  to  the  reader  is  that  the  student-level  SES  proxy 
for  maternal  education — “How  far  did  your  mother  go  in 
school?” — did  not  specify  to  survey  participants  whether  the 
“graduated  from  college”  response  option  meant  graduating  with  a 
4-year  degree  or  a  2-year  degree.  The  phrase  “graduated  from 
college”  may  be  subject  to  interpretation  based  on  one’s  back¬ 
ground.  For  example,  “graduated  from  college”  may  be  assumed  to 
mean  a  4-year  bachelor’s  degree;  however,  it  is  possible  that 
students  responding  to  the  survey  understood  “graduating  from 
college”  to  mean  obtaining  a  2-year  Associates  degrees  or  other 
community  college  certificates.  Acknowledging  the  weaknesses  of 
this  method  of  discerning  SES,  maternal  education  is  nonetheless 
a  commonly  used  measure  of  SES  (Krieger,  Williams,  &  Moss, 
1997);  its  inclusion  in  our  models  was  theoretically  and  statisti¬ 
cally  informative,  and  better  than  the  alternative  to  drop  it  entirely. 

Despite  the  study’s  limitations,  we  believe  the  significant  asso¬ 
ciations  found  between  school-level  discipline  disparities  and 
Black  students’  reports  of  adjustment  problems  and  perceptions  of 
school  belonging  and  equity  suggest  that  a  new  methodological 
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focus  on  discipline  disparity  as  a  school  contextual  factor  is 
needed.  Methods  to  examine  the  effects  of  the  school  social 
context  of  the  discipline  gap  should  quantify  the  gap  at  the  school- 
level  and  employ  multilevel  methodology,  as  well  as  incorporate 
mixed  methods  approaches  to  further  clarify  findings.  Prior  re¬ 
search  on  this  topic  is  almost  exclusively  qualitative,  and  no 
studies  have  directly  examined  data  from  different  sources  to 
validate  Black  students’  school  perceptions  in  relation  to  objec¬ 
tively  differential  school  practices.  Although  we  were  not  able  to 
explore  the  pathways  mediating  the  associations  found,  such  as 
perceived  discrimination  (as  we  have  hypothesized  in  our  discus¬ 
sion),  the  study  nonetheless  breaks  new  ground  in  examining  an 
important  developmental  determinant  that  may  be  uniquely  rele¬ 
vant  to  Black  youth  and  play  a  role  in  explaining  seemingly 
intractable  school  inequalities.  We  believe  these  findings  suggest 
that  discipline  disparities  may  extend  beyond  the  direct  impact  on 
suspended  students;  additional  research  is  needed  to  better  under¬ 
stand  the  within-school  social  and  ecological  implications  of  ra- 
cialized  punishment  on  Black  students’  development. 

Practice  Implications  and  Conclusions 

This  study  explored  an  important  gap  in  our  knowledge  regarding 
the  influence  of  Black-White  suspension  risk  disparities  on  Black  and 
White  students’  perceived  School  equity,  school  belonging,  and  self- 
reported  adjustment  problems.  We  found  that  Black  students’  percep¬ 
tions  of  equity  and  school  belonging  were  significantly  inversely 
associated  with  Black- White  suspension  risk  disparity,  whereas  no 
significant  association  was  found  in  the  White  student  sample.  More¬ 
over,  we  found  that  Black  students  reported  higher  rates  of  adjustment 
problems  in  schools  with  higher  Black-White  suspension  disparities, 
whereas  this  association  was  nonsignificant  for  White  students.  These 
findings  raise  a  number  of  key  questions  for  future  research  about 
potential  mediating  mechanisms;  they  also  prompt  further  consider¬ 
ation  of  appropriate  ways  to  measure  school  climate  to  account  for  the 
potential  influence  of  differential  school  discipline  in  students’  expe¬ 
rience  of  school  climate. 

The  results  highlight  the  need  for  more  research  on  interventions 
that  can  ultimately  eliminate  the  discipline  gap.  For  example,  it 
may  be  possible  to  effectively  coach  teachers  and  administrators  to 
implement  alternative  responses  during  vulnerable  decision  points 
in  their  disciplinary  encounters  with  Black  students  (McIntosh, 
Girvan,  Homer,  &  Smolkowski,  2014).  Interventions  targeting 
school  staff  mindfulness  and  stress  management  (e.g.,  Bottiani  et 
al.,  2012;  Jennings,  Frank,  Snowberg,  Coccia,  &  Greenberg,  2013) 
may  also  have  potential  to  restrict  bias  in  disciplinary  interactions. 

Despite  longstanding  efforts  to  curb  school  inequity,  discipline 
disparities  persist.  This  study’s  findings  suggest  that,  in  addition  to 
introducing  alternatives  to  suspension  (e.g.,  restorative  justice  pro¬ 
gramming)  and  equity  focused  interventions  to  eliminate  the  gap 
(e.g.,  culturally  responsive  classroom  behavior  management), 
more  immediate  supports  for  Black  youth  in  schools  with  highly 
differential  discipline  practices  may  be  needed.  Students’  experi¬ 
ences  of  equitable  and  inclusive  school  climate  are  an  important 
potential  target  for  improvement  by  school  administrators,  teach¬ 
ers,  and  school  counseling  staff,  particularly  in  light  of  research 
suggesting  that  school  climate  is  malleable  to  intervention  (Brad¬ 
shaw,  Koth,  Thornton,  &  Leaf,  2009).  Specifically,  the  direct 
engagement  of  Black  youth  in  efforts  to  address  their  schools’ 


differential  discipline  practices  may  have  potential  to  alter  their 
perceptions  of  belonging,  inclusion,  and  fair  treatment  at  the 
school,  in  addition  to  potentially  contributing  to  changes  in  school 
discipline  practices.  Work  by  Day-Vines  and  Terriquez  (2008) 
suggests  that  engaging  Black  and  Latino  youth  in  a  stakeholder 
taskforce  with  decision-making  power  to  broach  the  issue  of 
excessive  suspensions  may  serve  as  a  strengths-based  framework 
consistent  with  positive  youth  development  principles  for  stimu¬ 
lating  leadership,  self-management,  resiliency,  and  social  capital 
among  participating  youth.  The  findings  of  our  study  suggest  the 
necessity  of  similar  approaches  that  not  only  aim  to  stem  the 
discipline  gap  (which  may  be  a  longer  term  goal),  but  that  can  also 
immediately  engage  Black  youth  in  a  dialogue  about  their  percep¬ 
tions  of  their  schools’  discipline  practices.  Initiatives  to  broach  the 
issue  of  the  discipline  gap  with  Black  students  in  high-disparity 
schools  have  potential  to  disrupt  harmful  perceptions  of  the  school 
social  context  as  unfair  and  unaccepting  by  demonstrating  respect 
for  the  perspectives  of  Black  youth  and  some  readiness  to  change. 
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Teacher  Behavior  and  Peer  Liking  and  Disliking:  The  Teacher  as  a  Social 

Referent  for  Peer  Status 
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According  to  social  referencing  theory,  cues  peers  take  from  positive  and  negative  teacher  behavior 
toward  a  student  affect  the  student’s  peer  liking  and  disliking  status.  The  present  study  was  the  first  to 
test  the  hypothesized  mediation  model  connecting  teacher  behavior  with  peer  liking  and  disliking  status, 
via  peer  perceptions  of  teacher  liking  and  disliking  for  the  student.  We  used  a  longitudinal  design  and 
controlled  for  peer  perceptions  of  student  behavior.  A  sample  of  1,420  5th-grade  students  (Afage  =  10.60) 
from  56  classes  completed  sociometric  questionnaires  at  3  time  points  within  1  school  year.  At  the  first 
time  point  video  data  was  also  recorded,  and  teacher  behavior  toward  specific  students  was  coded.  A 
multilevel  path  analysis  showed  that  teachers  did  function  as  social  referents  for  peer  status  but  only 
through  their  negative  behavior  toward  a  student.  Negative  teacher  behavior  was  associated  with  peer 
perceptions  of  the  teacher’s  disliking  for  the  student  3  months  later,  which  in  turn  predicted  peers’ 
disliking  of  the  student  6  months  later.  Findings  suggest  that  teachers  play  a  prominent  role  in  peer 
relationships,  particularly  in  peer  disliking.  For  practice,  this  suggests  that  it  may  be  important  for 
teachers  to  refrain  from  openly  negative  behavior  toward  students,  particularly  those  at  risk  of  peer 
rejection. 

Keywords:  peer  status,  teacher  behavior,  social  referencing,  peer  reputation  of  teacher  (dis)liking 


The  teacher,  as  the  one  professional  close  to  the  peer  society  but 
still  not  a  part  of  it,  may  play  a  crucial  role  in  shaping  peer 
relations  (Farmer,  McAuliffe  Lines,  &  Hamm,  2011).  Besides 
supporting  a  classroom  climate  that  stimulates  positive  peer  rela¬ 
tionships  (e.g.,  Gest  &  Rodkin,  2011;  Mikami,  Griggs,  Reuland,  & 
Gregory,  2012)  and  shapes  social  behavior  and  academic  achieve¬ 
ment  norms  (e.g.,  Gest  &  Rodkin,  2011;  Hamm,  Farmer,  Lambert, 
&  Gravelle,  2014),  teachers  may  also  function  as  a  social  referent 
for  students’  peer  liking  and  disliking  status  (Hughes,  Cavell,  & 
Willson,  2001).  Peer  liking  and  disliking  status  refer  to  the  extent 
to  which  students  are  accepted  versus  rejected  by  their  classmates 
(Cillessen  &  Mayeux,  2004;  Rubin,  Bukowski,  &  Parker,  2006), 
and  is  strongly  related  to  students’  academic  (Flook,  Repetti,  & 
Ullman,  2005;  Wentzel,  2005)  and  social  development  (Ladd, 
2006;  Ladd  &  Troop-Gordon,  2003). 
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Generally,  social  referencing  describes  how  a  child  refers  to  a 
significant  other  (usually  a  parent)  for  cues  regarding  how  to  react 
in  an  ambiguous  situation  (Feinman,  1982;  Walden  &  Ogan, 
1988).  In  a  classroom,  the  teacher  is  most  likely  the  adult  that 
students  turn  to  for  these  cues  (Hughes  et  al.,  2001;  McAuliffe, 
Hubbard,  &  Romano,  2009),  and  the  teacher’s  behavior  toward  an 
individual  student  provides  the  student’s  peers  with  information 
about  the  likability  of  the  student.  In  this  sense,  the  teacher 
functions  as  an  affective  model  (Bandura,  1992)  for  how  to  feel 
about  a  student.  At  the  same  time,  the  social  referencing  mecha¬ 
nism  also  requires  an  active  role  of  peers,  who  perceive  their 
teacher  s  behavior  toward  another  student  and  then  interpret  it  in 
terms  of  their  own  affective  evaluation  of  the  student  (Hughes,  Im, 
&  Wehrly,  2014).  So  far,  however,  no  studies  have  investigated  the 
entire  social  referencing  mechanism  from  teacher  behavior  via 
peer  perceptions  to  peer  status. 

The  present  study  addresses  this  gap  in  knowledge  and  in¬ 
vestigates  (a)  which  teacher  behaviors  are  related  to  peer- 
perceived  teacher  liking  and  disliking,  and  how  these  peer 
perceptions  relate  to  liking  and  disliking  status  (basic  social 
referencing  hypothesis),  and  (b)  whether  the  association  be¬ 
tween  teacher  behavior  and  peer  status  is  fully  mediated  by 
peer-perceived  teacher  liking  and  disliking  (strong  social  ref¬ 
erencing  hypothesis).  The  present  study  adds  to  our  understand¬ 
ing  of  the  teacher  s  role  in  peer  relations  and,  in  particular, 
whether  and  which  teacher  behaviors  are  associated  with  peer 
liking  and  disliking  status.  For  teachers,  an  understanding  of 
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how  they,  deliberately  or  not,  inform  their  students’  peer  status 
may  also  be  an  enriching  element  to  their  repertoire  of  strate¬ 
gies  to  optimize  student  development. 

The  Teacher  as  a  Social  Referent 

Although  no  studies  have  investigated  associations  between 
teacher  behavior,  peer-perceived  teacher-student  relationships,  and 
peer  status  at  the  same  time,  several  lines  of  research  partially 
support  the  social  referencing  mechanism. 

Teacher  Behavior  and  Peer  Status 

First,  multiple  studies  have  shown  associations  between 
teacher  behavior  toward  specific  students  and  the  students’  peer 
status.  Based  on  an  experimental  study,  Flanders  and  Havumaki 
(1960)  concluded  that  teacher  praise  affected  peers’  sociometric 
choices  for  the  student.  White  and  colleagues  (White  &  Jones, 
2000;  White  &  Kistner,  1992;  White,  Sherman,  &  Jones,  1996) 
also  performed  a  series  of  experimental  studies  that  featured 
video  vignettes  of  classroom  interaction.  Student  behavior  was 
kept  constant,  but  teacher  behavior  toward  a  target  student  was 
manipulated  to  vary  from  derogatory  to  positive.  Kindergarten 
through  2nd-grade  participants  watched  the  video  and  rated 
their  liking  of  the  students.  In  the  positive  feedback  condition, 
the  target  was  evaluated  more  positively  than  in  the  derogatory 
feedback  condition  (White  &  Kistner,  1992),  even  when  par¬ 
ticipants  received  information  on  the  target’s  peer  reputation 
(White  &  Jones,  2000b;  White  et  al.,  1996).  In  a  more  natural¬ 
istic  study,  McAuliffe  et  al.  (2009)  studied  how  positive  and 
negative  teacher  comments  predicted  how  students  were  per¬ 
ceived  by  their  peers.  In  line  with  White  and  colleagues’ 
findings,  McAuliffe  et  al.  also  found  that  negative  teacher 
behavior  was  positively  associated  with  peer  disliking.  How¬ 
ever,  in  contrast  with  their  expectations,  positive  teacher  be¬ 
havior  was  negatively  associated  with  peer  liking.  They  ex¬ 
plained  this  as  either  an  indication  of  a  teacher’s  pet  effect, 
when  students  were  seen  as  the  teacher’s  favorite  and,  there¬ 
fore,  less  liked  (see  Babad,  1995,  2009),  or  the  result  of  an 
intervention  program  in  which  teachers  positively  addressed 
students  who  demonstrated  undesirable  behavior. 

Peer-Perceived  Teacher  Liking  and  Disliking  and 
Peer  Status 

Associations  of  peer  perceptions  of  teacher  (dis)liking  of  stu¬ 
dents  with  students’  peer  status — the  second  and  third  component 
of  social  referencing — were  investigated  by  Hughes  and  col¬ 
leagues  (2001,  2014;  Hughes,  Zhang,  &  Hill,  2006).  In  a  series  of 
studies  with  1st-  through  4th-grade  students,  they  examined  the 
role  of  peer-perceived  teacher  support  in  peer  acceptance.  They 
used  the  term  peer  reputation  to  refer  to  combined  perceptions  of 
all  classroom  peers  (Hughes  et  al.,  2014;  cf.  Hymel,  Wagner,  & 
Butler,  1990).  Students  who  had  a  stronger  peer  teacher  support 
reputation  also  had  higher  peer  likability.  In  one  study,  Hughes  et 
al.  (2001)  also  investigated  peer  reputation  of  teacher  conflict  and 
found  that  it  was  positively  associated  with  peer  disliking  but  was 
not  associated  with  peer  liking.  Moreover,  teacher  support  was  a 
stronger  (negative)  predictor  of  peer  disliking  than  teacher  conflict. 


Teacher  Reports  of  the  Teacher-Student  Relationship 

Two  final  lines  of  research  are  more  distantly  related  to  the 
social  referencing  mechanism,  as  these  examined  teacher-reported 
teacher-student  relationships.  Chang  et  al.  (2007);  Mercer  and 
DeRosier  (2008),  and  Taylor  and  Trickett  (1989)  found  that 
teacher  preference  for  students  was  positively  related  to  their  peer 
status,  concurrently  and  after  1  or  2  years.  Additionally,  De  Laet  et 
al.  (2014)  and  Hughes  and  Chen  (2011)  showed  that  teacher- 
reported  support  and  conflict  in  the  relationship  with  a  student 
were  associated  with  the  student’s  likability  with  peers,  although 
Leflot,  Van  Lier,  Verschueren,  Onghena,  and  Colpin  (2011)  did 
not  find  such  an  association. 

Teacher-reported  preference,  support  and  conflict  are  likely 
reflected  in  the  behavior  of  the  teacher  toward  the  students.  How¬ 
ever,  McAuliffe  et  al.  (2009)  found  that  teacher  preference  was 
associated  with  negative  rather  than  positive  teacher  behavior.  The 
association  between  teacher  preference  and  peer  liking  and  dislik¬ 
ing  status  might  be  because  of  teachers  and  students  valuing  the 
same  agreeable  characteristics  in  a  student  (e.g.,  Howes,  Hamilton, 
&  Matheson,  1994)  rather  than  a  social  referencing  effect. 

Connecting  the  Three  Components  of  Social 
Referencing 

Taken  together,  these  studies  suggest  that  teacher  behavior  with 
individual  students  informs  peers  about  the  teacher’s  (dis)liking 
for  the  student,  which  in  turn  affects  peers’  (dis)liking  of  that 
student.  However,  research  has  yet  to  reveal  the  degree  to  which 
peers’  views  of  teacher  (dis)liking  of  the  student  are  related  to  cues 
from  teacher  behavior. 

Cues  in  Teacher  Behavior 

Teachers  have  many  different  kinds  of  interactions  with  stu¬ 
dents,  each  potentially  containing  social  cues.  Whereas  prior  re¬ 
search  (e.g.,  McAuliffe  et  al.,  2009;  White  &  Kistner,  1992)  has 
only  linked  overall  teacher  positivity  and  negativity  to  peer  status, 
types  of  positive  and  negative  behavior  may  differentially  affect 
peer  status.  For  example,  a  positive  teacher  comment  may  concern 
a  student’s  academic  achievement,  agreeable  behavior,  or  a  nice 
jacket,  whereas  negative  teacher  comments  could  focus  on  a  wrong 
answer  or  aggressive  behavior  toward  a  peer.  To  further  qualify 
teacher  behavior,  in  this  study  we  distinguished  between  cognitive 
and  affective  teacher  comments  (Martin  &  Briggs,  1986;  Wool- 
folk,  Hughes,  &  Walkup,  2013;  see  also  Babad.  2009).  The  cog¬ 
nitive  domain  entails  giving  instructions  and  providing  positive  or 
negative  feedback  on  the  student’s  performance,  whereas  the  af¬ 
fective  domain  refers  to  the  extent  to  which  the  teacher  expresses 
warmth  and  liking  versus  disliking  for  the  student.  When  com¬ 
menting  in  the  affective  domain,  a  teacher  directly  communicates 
positive  or  negative  affect  for  the  student,  and  thereby  acts  as  an 
affective  model  (Bandura,  1992).  Although  less  directly,  when  a 
teacher  comments  in  the  cognitive  domain,  peers  may  also  view 
this  as  the  teacher  valuing  the  student  (see  Babad,  2009),  and  thus, 
may  also  use  this  information  in  their  interpretation  of  the  likabil¬ 
ity  of  the  student. 
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The  Role  of  Student  Behavior 

A  large  body  of  research  has  also  shown  that  peer  acceptance 
and  rejection  are  strongly  predicted  by  student  behavior  (e.g., 
Cillessen  &  Mayeux,  2004b;  Newcomb,  Bukowski,  &  Pattee, 
1993;  Rubin  et  al.,  2006).  Thus,  when  examining  the  association 
between  teacher  behavior  and  peer  status,  student  behavior  also 
has  to  be  taken  into  account.  The  present  study  included  the  four 
broadband  categories  of  student  behavior  that,  as  noted  in 
several  reviews,  are  most  consistently  related  to  peer  liking  and 
disliking  status  (aggression,  prosocial  behavior,  social  with¬ 
drawal,  and  academic  achievement;  Asher  &  McDonald,  2009; 
Cillessen  &  Mayeux,  2004;  Newcomb  et  al.,  1993).  In  general, 
showing  prosocial  behavior  and  good  academic  performance 
are  positively  related  to  peer  liking  and  negatively  to  peer 
disliking,  whereas  aggression  and  withdrawing  from  the  peer 
group  are  positively  associated  with  peer  disliking  (Asher  & 
McDonald,  2009;  Cillessen  &  Mayeux,  2004;  LaFontana  & 
Cillessen,  2002;  Wentzel,  2009). 

In  social  referencing,  student  behaviors  might  not  only  relate  to 
peer  status,  but  also  to  teacher  behavior  and  peer-perceived  teacher 
(dis)liking.  For  example,  student  behavior  likely  influences  teacher 
behavior,  because  teachers  respond  to  what  students  do  (e.g., 
Doumen  et  al.,  2008).  This  may  result  in  more  positive  teacher 
behavior  with  students  who  show  prosocial  behavior  and  more 
negative  behavior  with  aggressive  students.  Peers’  perceptions  of 
a  student’ s  behavior  may  also  inform  peer  perceptions  of  teacher 
(dis)liking.  Hughes  et  al.  (2001)  found  strong  associations  of  peer 
reputation  of  teacher  support  and  conflict  with  peer-perceived 
cooperative  and  aggressive  behavior  (r  =  .70,  and  .77,  respec¬ 
tively).  This  may  be  because  of  a  halo  effect  (Thorndike,  1920),  as 
prior  peer  evaluations  of  whether  a  student  performs  well  in 
school,  helps  others,  or  shows  aggressive  behavior,  might  bias 
other  peer  evaluations  (Moskowitz,  2005),  in  this  case  teacher 
liking  versus  disliking. 

To  separate  the  teacher’s  role  in  peer  status  from  other  processes 
occurring  in  the  classroom,  we  controlled  for  peer  perceptions  of 
student  behavior.  In  addition,  student  gender  was  included,  as 
teachers  seem  to  maintain  closer  and  more  positive  relationships 
with  girls  than  with  boys  (e.g.,  Hughes  et  al,  2001;  McCormick  & 
O’Connor,  2015). 

Developmental  Considerations 

In  primary  school,  teachers  seem  to  have  a  unique  window  of 
opportunity  for  affecting  students’  peer  evaluations,  a  window  that 
seems  to  shrink  in  adolescence,  when  the  peer  world  becomes  a 
social  world  of  itself  that  is  more  and  more  separated  from  adults 
(LaFontana  &  Cillessen,  2010).  For  instance,  Engels  et  al.  (2016) 
found  that  in  Belgian  secondary  schools,  peers’  perceptions  of  the 
teacher-student  relationship  were  not  associated  with  later  peer 
likability.  Therefore,  in  the  present  study  the  focus  was  on  5th- 
grade  students  and  their  teachers.  In  this  upper-elementary  age 
group,  generally  the  amount  of  negative  or  conflicted  teacher- 
student  interaction  increases,  whereas  the  amount  of  positive 
teacher-student  interaction  decreases  (see  Esposito,  1999;  Jerome, 
Hamre,  &  Pianta,  2009).  This  heterogeneity  in  teacher  behavior, 
combined  with  the  still  present  sensitivity  for  the  teacher  of 
elementary-school  students,  made  this  age  group  a  particularly 


interesting  one  in  the  investigation  of  the  role  of  the  teacher  as  a 
positive  as  well  as  negative  social  referent  for  peer  status. 

The  Present  Study 

The  present  study  explicitly  tested  the  social  referencing  mech¬ 
anism  in  two  steps.  First,  we  tested  a  basic  social  referencing 
hypothesis  by  examining  paths  from  teacher  behavior  via  peer 
perceptions  of  teacher  (dis)liking  to  peer  status,  controlling  for 
peer  perceptions  of  student  behavior  and  gender.  Figure  1  repre¬ 
sents  an  overview  of  the  conceptual  model.  Teacher  behavior 
encompassed  positive  and  negative  teacher  behavior  in  the  affective 
and  cognitive  domain.  For  peer  perceptions  of  teacher  (dis)liking  of  a 
student,  we  investigated  peer  reputation  of  teacher  liking 
(PRTL)  and  disliking  (PRTD;  see  Peer  Reputation  of  Teacher 
Support;  Hughes  et  al.,  2014).  Peer  status  comprised  both  liking 
and  disliking  status  to  discern  how  the  teacher  contributes  to 
both  acceptance  and  rejection. 

As  a  second  step,  we  tested  whether  a  strong  hypothesis  of 
social  referencing  was  tenable — that  is,  whether  associations  be¬ 
tween  teacher  behavior  and  peer  status  are  fully  mediated  by  PRTL 
and  PRTD  (i.e.,  the  dashed  line  in  Figure  1  equals  zero).  Following 
social  referencing  (Hughes  et  al.,  2001;  Walden  &  Ogan,  1988b), 
we  expected  positive  teacher  behavior  to  be  indirectly  and  posi¬ 
tively  related  to  peer  liking  and  negatively  to  peer  disliking,  fully 
mediated  by  (higher)  PRTL  and  (lower)  PRTD.  For  negative 
teacher  behavior,  social  referencing  predicts  a  negative  association 
with  peer  liking  and  a  positive  one  with  peer  disliking,  fully 
mediated  by  (lower)  PRTL  and  (higher)  PRTD. 

Because  we  assumed  positive  and  negative  cognitive  cues  in 
teacher  behavior  to  model  personal  affect  less  directly  than  affec¬ 
tive  cues,  we  expected  a  smaller  effect  of  cognitive  than  of 
affective  teacher  behavior.  Furthermore,  based  on  previous  re¬ 
search  in  which  negative  behavior  was  more  consistently  associ¬ 
ated  with  peer  status  than  positive  behavior  (see  McAuliffe  et  al., 
2009;  White  &  Kistner,  1992),  we  expected  negative  teacher 
behavior  to  be  more  strongly  related  to  peer  status  than  positive 
behavior.  This  is  in  line  with  negativity  bias,  or  a  “propensity  to 


Figure  1.  Overview  of  the  conceptual  model  tested  in  this  study.  Teacher 
behavior  comprises  positive  and  negative  behavior  in  the  cognitive  and 
affective  domain.  Peer  perception  of  student  behavior  concerns  prosocial 
behavior,  overt  and  relational  aggression,  social  withdrawal,  and  academic 
achievement.  Peer  perception  of  teacher  liking  and  disliking  comprises 
PRTL  and  PRTD  (Peer  Reputation  of  Teacher  Liking  and  Disliking).  Peer 
status  comprises  peer  liking  and  peer  disliking.  Solid  black  lines  represent 
the  basic  social  referencing  hypothesis.  The  grey  lines  depict  the  role  of 
student  behavior.  The  dashed  line  is  expected  to  be  0  under  the  strong 
social  referencing  hypothesis. 
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Table  1 

Operationalization  of  Teacher  Positive  and  Negative  Behavior 


Code 

Indicators 

Examples 

Affective  domain 

Negative 

Teacher  shows  conflict,  verbalizes 
disliking  for  the  student  or  his 
behavior. 

Warning  a  child  by  calling  their  name 
“Stop  that!” 

“You  are  being  really  annoying  right  now!” 
“Please  stop  that” 

Positive 

Teacher  shows  warmth,  verbalizes 
liking  for  the  student  or  his 
behavior. 

Words  of  affection,  like  sweetheart,  darling,  dear, 
my  friend 

Laughing  and  joking 
“That’s  very  nice  of  you” 

“Thank  you” 

Cognitive  domain 

Negative 

Teacher  indicates  that  a  student’s 
contribution  is  incorrect 

“You  got  it  wrong” 

“No” 

“That  is  almost  correct” 

“That’s  just  not  quite  right” 

Positive 

Teacher  acknowledges  or  praises 
the  student’s  contribution 

“Yes” 

“Correct” 

“That  is  perfect” 

“You  did  an  excellent  job” 

attend  to,  learn  from,  and  use  negative  information  far  more  than 
positive  information”  (Vaish,  Grossmann,  &  Woodward,  2008,  p. 
383).  Similarly,  PRTD  was  expected  to  have  a  stronger  association 
with  peer  status  than  PRTL. 

In  addition  to  the  social  referencing  paths,  we  expected  peer 
perceptions  of  prosocial  behavior  and  academic  achievement  to 
relate  to  PRTL  because  of  a  positive  halo  effect  (Thorndike,  1920) 
and  peer-perceived  aggressive  behavior  and  social  withdrawal  to 
predict  PRTD  because  of  a  negative  halo  effect.  Finally,  we 
expected  prosocial  behavior  and  academic  achievement  to  be  pos¬ 
itively  related  to  peer  liking  status  and  negatively  related  to  peer 
disliking  status,  and  aggression  and  social  withdrawal  to  be  posi¬ 
tively  related  to  peer  disliking  status  (Asher  &  McDonald,  2009; 
Newcomb  et  al.,  1993). 

Method 

Design 

Data  were  collected  at  three  time  points  within  one  school  year 
(fall,  winter,  and  spring).  Teacher  behavior  and  peer-perceived 
student  behavior  were  measured  at  Time  1  (Tl),  PRTL  and  PRTD 
at  Time  2  (T2),  and  peer  liking  and  disliking  status  at  Time  3  (T3). 
Measuring  the  variables  at  different  time  points  added  to  the 
predictive  power  of  the  proposed  model  (Hayes,  2013),  although 
we  acknowledge  that  we  could  not  draw  conclusions  regarding 
causality. 

Participants 

Teachers  and  their  students  in  56  classes  from  37  elementary 
schools  participated  in  this  study,  which  was  part  of  a  larger 
research  project  on  classroom  climate  in  5th  grade.  Class  sizes 
ranged  from  18  to  34  students  (M  —  26.16,  SD  —  3.77).  Only 
students  for  whom  informed  parental  consent  was  obtained  tor 
both  video  recording  and  questionnaires  participated  (1420  out  of 
1466;  96.9%).  The  students’  mean  age  at  Tl  was  10.60  years 


(SD  =  0.50),  and  47.3%  were  girls.  According  to  the  classification 
used  by  Statistics  Netherlands  (2012b),  84.3%  of  the  students  were 
Dutch  (both  parents  bom  in  the  Netherlands),  5.5%  were  Western 
immigrants  (at  least  one  parent  bom  in  another  Western  country) 
and  10.2%  were  non-Westem  immigrants  (at  least  one  parent  bom 
in  a  non-Westem  country).  This  distribution  was  representative  for 
the  areas  in  which  the  schools  were  located  (Statistics  Netherlands, 
2012a). 

In  the  Netherlands,  elementary  school  students  have  the  same 
teacher  for  every  lesson  (approximately  25  hr  a  week)  or  two 
part-time  teachers.  In  14  classes  (25%)  a  single  teacher  was  with 
the  class  all  the  time,  and  42  classes  had  two  teachers.  In  the  latter 
case,  the  teacher  who  spent  the  most  hours  in  the  classroom 
participated,  and  60.7%  of  all  teachers  were  with  their  class  at  least 
4  days  a  week.  At  Tl,  teachers  were  on  average  41.33  years  old 
(ranging  from  24.51  to  62.47,  SD  =  11.92),  had  an  on  average  of 
15.12  years  of  experience  (ranging  from  1  to  39,  SD  =  11.01),  and 
most  were  women  (n  =  36,  64.3%). 

Measures 

Teacher  behavior.  From  2  hr  of  video  observation,  we  coded 
teacher  behavior  in  each  occurrence  of  public  dyadic  teacher- 
student  interaction.  Using  event  sampling,  every  instance  of 
teacher  behavior  was  coded  when  (a)  at  least  half  of  the  student’s 
classmates  were  present  in  the  classroom  and  (b)  it  was  uttered  in 
connection  to  a  single  student  or  small  group.  Our  approach  of 
coding  only  public  teacher  behavior  excluded  those  teacher  behav¬ 
iors  exhibited  to  a  few  students  with  a  soft  voice  (e.g.,  working 
with  an  individual  student  or  small  group  of  students),  while  others 
were  working  independently  or  in  separate  small  groups.  Each 


1  In  total,  59  classes  participated  in  the  research  project.  Three  classes 
were  not  included  in  the  present  study:  1  because  the  class  dropped  out 
after  Tl,  1  because  of  its  exceptionally  large  class  size  and  two  teachers 
present  at  all  times,  and  1  because  the  classroom  teacher  was  on  personal 
leave  during  at  T2. 
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occurrence  was  coded  as  negative,  positive,  or  neutral/other  (see 
McAuliffe  et  al.,  2009),  for  the  affective  and  cognitive  domain 
independently.  For  example,  a  single  teacher  comment  could  be 
positive  in  the  affective  domain  but  negative  in  the  cognitive 
domain.  Table  1  provides  an  overview  of  the  operationalization 
and  examples  for  each  code.  As  we  were  mainly  concerned  with 
positive  and  negative  teacher  cues,  we  used  the  number  of  positive 
affective,  negative  affective,  positive  cognitive,  and  negative  cog¬ 
nitive  comments.  The  neutral/other  code  was  included  to  be  able  to 
code  all  behavior  exhaustively. 

The  first  author  and  two  trained  research  assistants  scored  the 
videos.  Interobserver  agreement  was  first  checked  for  event  oc¬ 
currence;  agreement  that  an  event  had  occurred  ranged  from  81  to 
87%  for  the  pairs  of  observers.  Next,  a  set  of  1624  occurrences 
(8.9%  of  the  total  number  of  fragments)  of  teacher  behavior  was 
coded  with  respect  to  the  content.  Weighted  Cohen’s  ks  for  the 
affective  domain  ranged  from  .72  to  .77  for  the  pairs  of  observers 
(substantial  agreement;  Landis  &  Koch,  1977)  and  ranged  from  .83 
to  .86  for  the  cognitive  domain  (almost  perfect  agreement). 

Peer  perceptions  of  teacher-student  relationship.  PRTL 
and  PRTD  were  each  measured  with  one  peer  nomination  item 
(PRTL:  “Which  classmates  are  liked  most  by  the  teacher?”; 
PRTD:  “Which  classmates  are  liked  least  by  the  teacher?”).  Items 
stated  the  name  of  the  teacher  involved.  Students  nominated  from 
a  list  containing  all  first  names  of  their  classmates.  To  avoid 
sequence  effects  (see  Poulin  &  Dishion,  2008),  the  names  were 
presented  in  a  random  order  that  was  different  for  each  participant. 
Same  and  opposite  sex  nominations  were  allowed,  and  nomina¬ 
tions  were  unlimited,  with  a  minimum  of  one.  Apart  from  them¬ 
selves,  students  could  nominate  any  classmate,  whether  or  not 
present  and  consented.  Nominations  of  nonconsented  students 
were  excluded  from  the  dataset.  We  calculated  a  proportion  score 
for  PRTL  and  PRTD  for  every  student  as  the  number  of  received 
nominations  divided  by  the  number  of  nominators  (i.e.,  the  stu¬ 
dents  who  were  present  and  consented,  reduced  by  1  if  the  student 
was  among  the  nominators). 

Peer  liking  and  disliking  status.  We  measured  peer  liking 
and  disliking  status  by  asking  students  to  nominate  those  peers 
they  liked  most  (“Which  classmates  do  you  like  most?”)  and  peers 
whom  they  liked  least  (“Which  classmates  do  you  like  least?”). 
The  same  procedures  were  applied  as  those  for  PRTL  and  PRTD 
to  gather  the  nominations  and  to  compute  proportion  scores. 

Peer  perceptions  of  student  behavior.  Peer  perceptions  of 
prosocial  behavior,  overt  and  relational  aggression,  social  with¬ 
drawal,  and  academic  achievement  were  measured  using  similar 
sociometric  procedures.  The  prosocial  items  were  “Which  class¬ 
mates  cooperate  well?”  and  “.  .  .  help  other  children?”,  r  =  .75, 
p  <  .001.  Overt  aggression  items  were:  “Which  classmates  call 
other  children  mean  names?”  and  “.  .  .  hit  or  kick  other  children?” 
(r  =  .92,  p  <  .001).  Relational  aggression  items  were:  “Which 
classmates  gossip  about  other  children?”  and  “.  .  .  exclude  other 
children?”,  r  =  .71,  p  <  .001.  The  social  withdrawal  item  was 
“Which  classmates  are  often  by  themselves  during  breaks?”  Peer- 
perceived  academic  achievement  was  measured  using  the  item 
“Which  classmates  receive  high  grades?”  For  each  item,  propor¬ 
tion  scores  were  computed  as  indicated  above.  Prosocial  behavior 
and  overt  and  relational  aggression  were  calculated  as  the  average 
scores  on  the  relevant  items. 


Procedure 

Schools  located  in  middle,  southern,  and  eastern  regions  of  the 
Netherlands  were  recruited  to  participate.  After  a  school’s  princi¬ 
pal  and  the  classroom  teacher  agreed  to  participate,  parents  were 
informed  about  the  study  goals  and  were  asked  for  their  consent 
regarding  their  child’s  participation.  Data  were  collected  in  the  fall 
(Tl),  winter  (T2)  and  spring  (T3)  of  the  2012/2013  school  year,  13 
to  15  weeks  (T1-T2)  and  9  to  11  weeks  (T2-T3)  apart.  At  every 
time  point,  all  consented  students  completed  the  questionnaires  on 
netbook  computers  in  their  own  classrooms.  The  students  were 
seated  separately  and  partition  screens  flanked  the  netbooks  to 
prevent  distraction  and  increase  privacy.  Standard  instructions 
were  given  concerning  the  content  of  the  questions  and  confiden¬ 
tial  data  handling.  Two  hours  of  video  were  recorded  on  the  same 
day  the  questionnaires  were  completed.  A  camera  was  positioned 
in  the  back  of  the  classroom  and  teachers  wore  a  microphone,  so 
that  their  verbal  behavior  was  well  audible  to  the  observers.  During 
video  recording,  no  researcher  was  present  in  the  classroom. 
Teachers  were  free  to  follow  their  normal  lesson  plans,  except 
from  scheduling  tests — because  little  interaction  takes  place  dur¬ 
ing  tests — and  from  individual  student  presentations — because 
classroom  interactions  then  typically  revolve  around  the  presenting 
student.  After  the  third  measurement  moment,  teachers  received  a 
summary  of  the  findings  for  their  classrooms. 

Analysis 

To  test  our  hypotheses,  two  path  analysis  models  were  tested 
using  Mplus  version  7.2  (Muthen  &  Muthen,  1998-2012).  First,  to 
test  the  basic  social  referencing  hypothesis,  we  specified  a  model 
containing  paths  from  Tl  teacher  behavior  variables  to  T2  PRTL 
and  PRTD  and  from  T2  PRTL  and  PRTD  to  T3  peer  liking  and 
disliking  status  (the  solid  black  lines  in  Figure  1),  and  all  paths 
from  Tl  student  behavior  variables  to  T2  PRTL  and  PRTD  and  to 
T3  peer  liking  and  disliking  status  (the  gray  part  of  Figure  1). 
Gender  was  added  as  a  dichotomous  predictor  variable  (boys  =  0, 
girls  =1).  Second,  to  test  the  strong  social  referencing  hypothesis 
of  full  mediation,  paths  from  Tl  teacher  behavior  to  T3  peer  status 
were  added  (the  dashed  line  in  Figure  1),  and  the  change  in  model 
fit  was  examined  as  well  as  the  significance  of  the  parameter 
estimates. 

To  account  for  the  nested  data  structure  with  students  clustered 
in  classes,  the  “Complex”  function  in  Mplus  was  used  with  max¬ 
imum  likelihood  estimation  with  robust  standard  errors  (MLR).  All 
predictor  variables,  except  for  gender,  were  class-mean  centered 
(i.e.,  the  class  mean  was  subtracted  from  the  individual  scores)  to 
correct  for  classroom-level  tendencies  to  nominate  more  or  fewer 
students  or  teachers’  tendencies  to  have  more  or  fewer  positive 
and/or  negative  interactions  with  their  students  (see  Marsh  et  al., 
2012).  In  addition,  both  models  contained  all  covariances  among 
the  Tl  predictors,  between  T2  PRTL  and  PRTD  and  between  T3 
peer  liking  and  disliking  status. 

As  part  of  the  research  project  on  classroom  climate,  25  of  the 
56  classrooms  participated  in  an  intervention  between  Tl  and  T2, 
which  involved  increasing  teacher  awareness  of  the  classroom 
social  climate,  rearrangement  of  classroom  seating,  and  teacher 
assignments  with  individual  students.  Therefore,  we  also  tested  for 
path  invariance  across  the  intervention  and  control  conditions.  We 
used  x  difference  tests  based  on  log  likelihood  values  and  scaling 
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correction  factors  obtained  with  the  MLR  estimator  (see  Satorra, 
2000)  to  compare  the  constrained  (equal  path  estimates  for  both 
conditions)  and  unconstrained  (path  estimates  freely  estimated  for 
both  conditions)  models.  For  model  identification  reasons,  we 
tested  invariance  separately  for  the  prediction  of  PRTL  and  PRTD, 
and  for  peer  liking  and  peer  disliking  status.  Path  invariance  could 
be  assumed  for  peer  liking  and  peer  disliking,  Ax2(10)  =  7.15,  p  = 
.711.  For  PRTL  and  PRTD,  path  invariance  could  not  be  assumed, 
Ax2(l  1)  =  26.82,  p  =  .005,  unless  the  path  from  gender  to  PRTL 
was  allowed  to  vary  across  conditions;  Ax2(10)  =  17.61,  p  =  .062. 
The  path  was  significant  and  positive  for  both  conditions  but  was 
stronger  in  the  intervention  condition.  Hence,  the  path  from  gender 
to  PRTL  was  estimated  separately  for  both  conditions. 

In  a  similar  vein,  classes  that  had  a  single  teacher  or  two 
teachers  can  be  considered  two  subpopulations  in  which  social 
referencing  paths  might  differ.  Using  the  same  procedure  as  ap¬ 
plied  for  model  invariance  across  study  conditions,  we  tested 
model  invariance  for  the  classes  that  had  a  single  teacher  versus 
the  classes  in  which  the  primary  of  two  teachers  participated.  Path 
invariance  could  be  assumed  for  all  outcomes  (PRTL:  Ax2(5)  = 
4.66,  p  =  .459,  PRTD:  Ax2(6)  =  6.25,  p  =  .396,  peer  liking: 
Ax2(4)  =  2.25,  p  =  .690,  and  peer  disliking:  Ax2(6)  =  4.38,  p  = 
.625). 

Results 

Preliminary  Analyses 

Before  running  the  analyses,  assumptions  were  checked.  Neg¬ 
ative  teacher  behavior  variables,  PRTD,  peer  disliking,  overt  and 
relational  aggression,  and  social  withdrawal  were  positively 
skewed,  which  was  accounted  for  by  the  MLR  estimator.  Assump¬ 
tions  of  univariate  and  multivariate  linearity  and  homoscedasticity 
were  met.  In  total,  49  students  (3.5)  had  multivariate  outliers;  these 
cases  differed  from  those  without  outliers  in  prosocial  behavior: 
MoutIier  =  -.08;  Mno  outUer  =  .00;  449.94)  =  3.34, p  =  .002,  overt 
aggression,  Moutlier  =  .15;  Mno_outlier  =  -.01;  449.03)  =  3.42, 
p  =  .001,  social  withdrawal,  MoutJier  =  .06;  Mno  ouflier  =  .00;  449.30) 
2.37,  p  =  .022,  and  PRTD,  Moutlier  =  .22;  Mno  outlier  =  -.01; 
448.60)  =  4.67,  p  <  .001.  As  the  scores  of  these  outlier  cases  were 
extreme  but  not  impossible,  we  kept  the  data  intact  and  ran  the 
analyses  twice,  including  and  excluding  outliers. 

Descriptive  statistics  of  the  study  variables  are  displayed  in 
Table  2.  Missing  values  were  all  because  of  absence  on  the  day  of 
data  collection  or  to  students  who  were  no  longer  attending  the 
school.  Teachers  made  more  negative  than  positive  comments  in 
the  affective  domain,  41377)  =  6.24,  p  <  .001,  but  more  positive 
than  negative  comments  in  the  cognitive  domain,  41377)  =  25.92 
p  <  .001.  On  average,  students  were  perceived  by  their  peers  as 
being  more  liked  by  the  teacher  (PRTL)  than  disliked  (PRTD), 
41410)  =  30.88  p  <  .001,  and  peer  liking  scores  were  higher  than 
peer  disliking  scores,  41405)  =  14.51,  p  <  .001.  The  intraclass 
correlations  (ICC)  represent  the  amount  of  variance  that  can  be 
ascribed  to  the  class  level.  These  ranged  from  0%  (for  social 
withdrawal)  to  56%  (for  PRTL).  In  general,  ICCs  for  positive 
teacher  and  peer  variables  were  higher  than  those  for  negative 
variables  (regarding  teacher  behavior  and  peer  perceptions). 

Table  3  shows  the  bivariate  Spearman  correlations,  which  were 
used  given  the  nonnormality.  Most  correlations  are  in  line  with  the 


social  referencing  hypotheses;  both  types  of  negative  teacher  be¬ 
havior  were  positively  related  to  PRTD,  which  was  negatively 
related  to  peer  liking  and  positively  to  peer  disliking  status.  PRTL 
was  positively  associated  with  peer  liking  and  negatively  with  peer 
disliking  status.  Only  the  correlations  involving  positive  teacher 
behavior  were  not  in  line  with  the  expectations.  Positive  teacher 
behavior  in  the  affective  domain  was  slightly  negatively  related  to 
PRTL  (rs  =  —.11,  p  <  .001)  and  positively  to  PRTD  (rs  =  .09, 
p  —  .001).  Positive  teacher  behavior  in  the  cognitive  domain  was 
positively  related  to  PRTD  (rs  =  .05,  p  =  .047).  Finally,  positive 
teacher  behavior  in  the  affective  domain  was  slightly  positively 
correlated  with  peer  disliking  (rs  =  .06,  p  =  .034). 

Basic  Social  Referencing  Model 

In  the  first  model,  we  tested  all  paths  from  teacher  behavior  to 
PRTL  and  PRTD  and  from  PRTL  and  PRTD  to  peer  liking  and 
disliking  status,  while  controlling  for  peer-perceived  student  be¬ 
havior  and  gender  (see  Figure  2  for  the  path  diagram).  The  model 
fit  was  excellent;  x2(8)  =  6.95,  p  =  .542,  root  mean  square  error 
of  approximation  (RMSEA)  —  .00,  comparative  fit  index  (CFI)  = 
1.00,  Tucker-Lewis  index  (TLI)  =  1.00,  standardized  root  mean 
square  residual  (SRMR)  =  .00.  Table  4  contains  the  parameter 
estimates  for  this  model.  The  model  explained  40.9%  of  the 
variance  in  PRTL,  59.8%  of  the  variance  in  PRTD,  32.1%  of  the 
variance  in  peer  liking,  and  5 1 .6%  of  the  variance  in  peer  disliking. 

Negative  teacher  behavior  in  both  the  affective  and  cognitive 
domains  predicted  PRTD,  which  in  turn  predicted  peer  disliking 
status.  As  expected,  PRTL  and  PRTD  were  both  predicted  by  the 
student  behavior  variables  and  gender.  PRTL  was  most  strongly 
associated  with  peer-perceived  prosocial  behavior  of  the  student, 
and  girls  were  viewed  as  liked  more  by  the  teacher  than  boys. 
PRTD,  on  the  other  hand,  was  most  strongly  predicted  by  overt 
aggression.  Peer  liking  status  was  most  strongly  predicted  by 
peer-perceived  prosocial  behavior,  but  also  (negatively)  by  peer 
perceptions  of  social  withdrawal.  Peer  disliking  status  was,  next  to 
PRTD,  predicted  by  social  withdrawal  and  overt  aggression,  and 

Table  2 


Descriptive  Statistics  of  the  Study  Variables 


Variable 

N 

M 

SD 

Min. 

Max. 

ICC 

Social  referencing  variables 

Teacher  positive  affective  T1 

1,378 

1.42 

2.32 

.00 

20.00 

.36 

Teacher  negative  affective  T1 

1,378 

2.02 

3.30 

.00 

39.00 

.11 

Teacher  positive  cognitive  T1 

1,378 

2.22 

2.77 

.00 

21.00 

.20 

Teacher  negative  cognitive  T1 

1,378 

.46 

.98 

.00 

9.00 

.10 

PRTL  T2 

1,411 

.34 

.19 

.00 

1.00 

.56 

PRTD  T2 

1,411 

.10 

.16 

.00 

.97 

.04 

Peer  liking  T3 

1,406 

.21 

.12 

.00 

.67 

.18 

Peer  disliking  T3 

1,406 

.12 

.15 

.00 

1.00 

.01 

Control  variables 

Prosocial  behavior  T1 

1,420 

.27 

.15 

.00 

.82 

.17 

Overt  aggression  T1 

1,420 

.12 

.18 

.00 

.98 

.02 

Relational  aggression  T1 

1,420 

.12 

.13 

.00 

.76 

.03 

Social  withdrawal  T1 

1,420 

.06 

.12 

.00 

.95 

.00 

Academic  achievement  T1 

1,420 

.25 

.24 

.00 

1.00 

.03 

Note.  Descriptive  statistics  reflect  scores  before  centering.  ICC  =  inter¬ 
class  correlation  coefficient;  T1  =  Time  1;  T2  =  Time  2;  T3  =  Time  3; 
PRTL  =  Peer  Reputation  of  Teacher  Liking;  PRTD  =  Peer  Reputation  of 
Teacher  Disliking. 
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Table  3 


Bivariate  Spearman  Correlations  Among  All  Study  Variables 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Social  referencing  variables 

1.  Teacher  positive  affective  T1 

2.  Teacher  negative  affective  T1 

.23** 

3.  Teacher  positive  cognitive  T1 

.27** 

.12” 

— 

4.  Teacher  negative  cognitive  T1 

.20** 

.17” 

.33” 

— 

5.  PRTL  T2 

-.11** 

-.30” 

.01 

-.05 

— 

6.  PRTD  T2 

.09** 

.29** 

.05* 

.10” 

-.59” 

— 

7.  Peer  liking  T3 

-.04 

-.16” 

.00 

-.00 

.29** 

-.30” 

— 

\ 

8.  Peer  disliking  T3 

.06* 

.19” 

.04 

.05 

-.38” 

.46’* 

-.51” 

— 

Control  variables 

9.  Prosocial  behavior  T1 

-.09*’ 

-.27” 

-.00 

-.08** 

.57” 

-.55” 

.53” 

-.56” 

— 

10.  Overt  aggression  T1 

.12** 

.33” 

.03 

.08” 

-.51” 

.66” 

-.29” 

.42” 

-.56” 

— 

11.  Relational  aggression  T1 

.08” 

.25” 

.08” 

.07” 

-.31” 

.40” 

-.20” 

.35” 

-.32” 

.53” 

— 

12.  Social  withdrawal  T1 

-.04 

-.03 

-.02 

-.04 

-.02 

.05 

-.28” 

.19” 

-.23” 

.02 

-.16” 

— 

13.  Academic  achievement  T1 

-.02 

-.12” 

-.02 

—  09** 

.24” 

-.27” 

.31” 

-.31” 

.56” 

-.22” 

-.18” 

-.16” 

— 

14.  Gender 

-.14” 

-.28” 

-.02 

-.07* 

.47” 

-.41” 

.08” 

-.17” 

.35” 

-.49” 

.02 

-.04 

-.03 

Note.  All  variables  are  group  mean  centered,  except  for  peer  (dis)liking.  T1  =  Time  1;  T2  =  Time  2;  T3  =  Time  3;  PRTL  =  Peer  Reputation  of  Teacher 
Liking.  PRTD  =  Peer  Reputation  of  Teacher  Disliking. 

*  p  <  .05.  **p<.01. 


was  negatively  related  to  prosocial  behavior.  Boys  were  liked 
somewhat  more  than  girls,  and  girls  were  disliked  somewhat  more 
than  boys.  This  could  be  because  of  the  slightly  skewed  gender 
distribution  in  the  sample.  As  liked-most  nominations  are  mostly 
given  to  same-sex  peers  and  liked-least  nominations  to  opposite 
sex  peers  (e.g.,  Dijkstra,  Lindenberg,  &  Veenstra,  2007;  Rose  & 
Smith,  2009),  with  more  boys  in  the  sample  boys  were  more  likely 
to  receive  liked-most  nominations  and  less  likely  to  receive  liked- 
least  nominations. 

Appendix  shows  the  results  of  the  model  when  excluding  the 
multivariate  outliers,  which  were  very  similar,  except  that  the  path 
from  negative  teacher  behavior  in  the  affective  domain  to  PRTL 
became  significant,  b  =  —0.002,  p  =  .042.  This  path  was  consis¬ 
tent  with  social  referencing  theory,  but  its  effect  size  was  relatively 
small  ((3  =  —.04).  The  appearance  of  this  path  after  excluding 
outliers  might  indicate  that  the  outliers  increased  error  variance 
and  thus  decreased  power  to  detect  smaller  effects  (Howitt  & 
Cramer,  2011;  Stevens,  2001). 

The  role  of  student  behavior.  To  examine  the  importance  of 
the  covariates  in  the  social  reference  model,  we  tested  the  model 
again  without  peer  perceptions  of  student  behavior.  Although  this 
less  complex  model  showed  a  good  fit,  x2(8)  =  8-42,  p  =  .394, 
RMSEA  -  .01,  CFI  =  1.00,  TLI  =  1.00,  SRMR  =  .01,  the 
Akaike’s  Information  Criterion  (AIC;  —7817.30  vs.  —9801.99)  and 
Bayesian  Information  Criterion  (BIC;  —7702.42  vs.  -9561.78),  in¬ 
dices  showed  better  model  fit  for  the  one  including  the  covariates. 
The  less  complex  model  also  explained  considerably  less  variance; 
only  5.6  and  16.3%  of  the  variance  in  PRTL  and  PRTD,  and  11.0 
and  34.2%  of  the  variance  in  peer  liking  and  disliking.  The  worse 
model  fit  and  the  lower  rate  of  explained  variance  in  the  less 
complex  model  highlight  the  importance  of  considering  student 
behavior  and  gender  as  covariates. 

Strong  Social  Referencing  Model 

Finally,  we  tested  whether  a  strong  hypothesis  of  social 
referencing  was  tenable — that  is,  whether  teacher  behavior  was 


only  indirectly  related  peer  status  through  peer  perceptions  of 
teacher  (dis)liking.  Adding  direct  paths  from  the  teacher  behav¬ 
ior  variables  to  peer  liking  status  did  not  improve  the  model  fit; 
Ay2(8)  —  6.95,  p  =  .542.  Correspondingly,  neither  of  these 
paths  were  significant.  Table  5  shows  the  coefficients  of  the 
indirect  paths  from  the  teacher  behavior  and  student  variables  to 
peer  disliking  status,  mediated  by  PRTD.  The  indirect  path  from 
overt  aggression  to  peer  disliking,  via  PRTD,  was  the  most 
prominent. 

Discussion 

Social  referencing  theory  posits  that  peers  evaluate  a  student 
based,  in  part,  on  their  observations  of  teacher-student  interac¬ 
tion  (Hughes  et  al.,  2001;  McAuliffe  et  al„  2009).  This  study 
was  the  first  to  investigate  all  three  components  of  social  referencing 
together  and  over  time:  teacher  behavior,  peer  perceptions  of  the 
teacher  (dis)liking  of  students,  and  peer  (dis)liking  status.  Further¬ 
more,  this  study  added  to  earlier  research  by  considering  the  added 
value  of  teacher  behavior  over  and  above  well-known  effects  of 
peer  perceptions  of  student  behavior  and  student  gender  (see 
Cillessen  &  Mayeux,  2004;  Newcomb  et  al.,  1993).  We  found 
partial  evidence  for  the  social  referencing  theory,  particularly 
through  a  negative  pathway.  The  more  negative  teacher  comments 
a  student  received,  the  more  peers  thought  the  teacher  disliked  that 
student,  which  resulted  in  a  higher  peer  disliking  status.  As  social 
referencing  theory  predicts  (Hughes  et  al.,  2001),  the  association  of 
teacher  behavior  with  peer  disliking  status  was  fully  mediated  by 
how  peers  perceived  teacher  disliking  of  the  student.  However, 
contrary  to  earlier  findings  (see  Hughes  et  al.,  2001,  2014),  we  did 
not  find  evidence  of  a  positive  social  referencing  pathway  linking 
positive  teacher  behavior  with  peer  liking.  The  present  study 
findings  underline  the  importance  of  controlling  for  student  be¬ 
havior,  and  examining  effects  for  peer  liking  and  disliking  status 
separately. 
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Teacher  behavior 


Positive  teacher 
affective  T1 


Peer  perception  of  teacher- 
student  relationship 


Peer  status 


Figure  2.  Path  diagram  of  the  final  model,  containing  all  significant  paths  except  covariances  between 
measures  taken  at  the  same  time  point  (e.g.,  peer  liking  and  disliking  status  at  T3).  Black  arrows  represent 
social  referencing  paths,  grey  arrows  represent  control  variable  paths.  Standardized  path  estimates  are 
provided.  *  p  <  .05.  ”  p  <  .01. 


Negative  Teacher  Behavior  and  Teacher  Disliking 

In  concordance  with  McAuliffe  et  al.’s  (2009)  and  White  and 
Kistner’s  (1992)  findings,  negative  teacher  behavior  was  positively 
associated  with  subsequent  peer  disliking  status.  And  in  line  with 
negativity  bias  (Vaish  et  al.,  2008),  negative  teacher  behavior  was 
more  strongly  related  to  peer  status  than  positive  behavior.  As 
predicted  by  social  referencing  theory  (Hughes  et  al.,  2001),  the 
association  of  negative  teacher  behavior  with  peer  status  depended 
on  peer  perceptions  of  teacher  disliking.  Nonetheless,  peer- 
perceived  overt  aggression  was  a  stronger  predictor  of  both  PRTD 
and  peer  disliking  than  teacher  behavior.  This  may  indicate  a 
negative  halo  effect;  when  peers  see  aggressive  student  behavior 
they  are  inclined  to  think  the  teacher  must  dislike  the  student 
(Moskowitz,  2005;  Thorndike,  1920).  Alternatively,  other,  unre¬ 
corded  teacher  behaviors  might  be  more  strongly  connected  to 
PRTD.  To  further  explore  whether  the  origin  of  PRTD  lies  pri¬ 
marily  in  teacher  behavior  or  in  specific  student  behavior,  it  would 
be  interesting  to  interview  students  and  ask  how  they  know  that  the 
teacher  dislikes  a  certain  classmate. 


As  expected,  negative  teacher  behavior  in  the  affective  domain 
predicted  PRTD  more  strongly  than  negative  teacher  behavior  in 
the  cognitive  domain.  The  former  directly  communicates  teacher 
negative  affect  of  what  a  student  says  or  does,  whereas  this 
affective  valence  is  less  clearly  present  in  the  cognitive  domain. 
When  a  teacher  says  that  an  answer  is  incorrect,  peers  will  not 
necessarily  believe  that  the  teacher  dislikes  the  student.  Still,  the 
negative  information  in  the  cognitive  domain  added  to  peer  per¬ 
ceptions  of  teacher  disliking. 

In  line  with  Hughes  et  al.  (2001),  PRTD  was  associated  with 
peer  disliking  but  not  peer  liking.  Apparently,  having  a  reputation 
of  being  disliked  by  the  teacher  has  the  power  to  amplify  negative 
affect  among  peers  but  does  not  reduce  liking. 

Positive  Teacher  Behavior  and  Teacher  Liking 

Notably,  teacher  behavior  did  not  inform  peer-perceived 
teacher  liking,  and  peers  seemed  to  use  other  information 
sources  to  decide  whether  or  not  the  teacher  likes  a  student.  We 
found  indications  of  a  positive  halo  effect:  peer-perceived 
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Table  4 

Path  Analysis  Results  of  the  Basic  Social  Referencing  Model,  Predicting  Peer-Perceived  Teacher  Liking  and  Disliking,  and 
Peer  Status 


Predictor 

Peer  perception  of  teacher  liking  and  disliking 

Peer  status 

PRTL  T2 

PRTD  T2 

Peer  liking  T3 

Peer  disliking  T3 

B  ( SE) 

P 

B  (SE) 

P 

B  ( SE) 

3 

B  ( SE) 

3 

Social  referencing  variables 

Positive  teacher  affective  T1 

.00  (.00) 

.00 

.00  (.00) 

.01 

Negative  teacher  affective  T1 

.00  (.00) 

-.01 

.01  (.00)** 

.09 

Positive  teacher  cognitive  T1 

.00  (.00) 

.05 

-.00  (.00) 

-.01 

\ 

Negative  teacher  cognitive  T1 

-.00  (.00) 

-.01 

.01  (.00)* 

.05 

PRTL  T2 

.02  (.03) 

.02 

.05  (.03) 

.04 

PRTD  T2 

-.05  (.03) 

-.06 

.31  (.06)** 

.33 

Control  variables 

Prosocial  behavior  T1 

.38  (.04)** 

.41 

-.10  (.03)** 

-.09 

.47  (.04)** 

.51 

-.28  (.04)** 

-.26 

Overt  aggression  T1 

.01  (.03) 

.02 

.51  (.04)** 

.57 

-.04  (.03) 

-.06 

.16  (.05)** 

.19 

Relational  aggression  T1 

-.13  (.03)** 

-.13 

.10  (.05)* 

.08 

.00  (.03) 

.00 

.08  (.04)* 

.07 

Social  withdrawal  T1 

.14  (.03)** 

.14 

.04  (.03) 

.03 

-.15  (.02)** 

-.14 

.33  (.04)** 

.27 

Academic  achievement  T1 

.04  (.02)* 

.07 

-.03  (.02) 

-.04 

-.04  (.01)** 

-.07 

.02  (.02) 

.03 

Gender 

.07  (,01)‘*a 

.30 

-.02  (.01)* 

-.08 

-.04  (.01)** 

-.15 

.03  (.01)** 

.10 

R2 

.41 

.60 

.32 

.52 

Note.  T1  =  Time  1;  T2  =  Time  2;  T3  =  Time  3;  PRTL  =  Peer  Reputation  of  Teacher  Liking.  PRTD  =  Peer  Reputation  of  Teacher  Disliking. 
a  The  path  from  gender  to  PRTL  is  the  combined  path  for  both  conditions.  The  parameter  estimate  was  .10  (SE  .02,  p  <  .001)  for  the  intervention  and  .05 
(SE  .01,  p  <  .001)  for  the  control  condition. 

*  p  <  .05.  *><.01. 


prosocial  behavior  and  academic  performance  predicted  peer 
perceptions  of  teacher  liking.  Alternatively,  peers  might  deduce 
the  teacher’s  preferences  from  other  teacher  practices,  for  ex¬ 
ample,  spending  time  with  students  during  breaks,  contact  in  the 
hallway  or  giving  students  special  tasks.  Furthermore,  positive 
teacher  behavior  in  the  affective  domain  may  have  included 
praise  for  compliance  (e.g.,  “thank  you  for  using  your  soft 
voice”),  which  teachers  possibly  use  more  often  with  students 
who  generally  show  undesirable  behavior.  The  finding  that 
teacher  positive  and  negative  behaviors  correlated  positively, 
which  was  inconsistent  with  McAuliffe  et  al.’s  (2009)  findings, 
might  also  reflect  this.  A  student  who  often  disrupts  a  lesson 
may  receive  many  negative  teacher  comments,  but  also  more 
positive  teacher  comments  when  he  does  occasionally  comply 
with  the  classroom  rules.  For  future  research,  it  seems  worth¬ 
while  to  interview  students  and  ask  why  they  think  that  the 
teacher  likes  a  classmate,  and  to  unravel  which  positive  com- 

Table  5 


Indirect  Effects  on  Peer  Disliking  Via  Peer  Reputation  of 
Teacher  Disliking 


Predictor 

B  ( SE) 

3 

Social  referencing  variables 

Negative  teacher  affective  T1 

.001  (.000)’* 

.03 

Negative  teacher  cognitive  T1 

.003  (.001)* 

.02 

Control  variables 

Prosocial  behavior 

-.032  (.012)** 

-.03 

Overt  aggression 

.159  (.031)** 

.19 

Relational  aggression 

.032  (.015)* 

.03 

Gender 

-.007  (.003)* 

-.03 

Note.  T1  =  Time  1. 
><.05.  *><.01. 


ments  actually  send  the  message  that  the  teacher  is  not  pleased 
with  the  student. 

Inconsistent  with  the  findings  of  Hughes  et  al.  (2001,  2006, 
2014),  PRTL  did  not  predict  peer  status  after  controlling  for 
peer  perceptions  of  student  behavior  and  student  gender.  Our 
consideration  of  these  covarites  might  explain  this  difference. 
We  did  find  a  positive  correlation  between  PRTL  and  peer 
liking  status,  but  PRTL  did  not  uniquely  predict  peer  liking 
after  controlling  for  the  covariates.  Given  the  current  results,  it 
seems  important  to  take  student  behavior  and  student  gender 
into  account,  as  not  including  them  might  lead  to  an  overesti¬ 
mation  of  associations  between  teacher  variables  and  peer  sta¬ 
tus.  An  alternative  explanation  could  be  that  some  students 
were  viewed  as  the  teacher’s  pet  (Babad,  2009).  Babad  (1995) 
found  that  peers  either  liked  the  students  whom  they  also 
perceived  the  teacher  to  like  (liked  by  all)  or  disliked  those 
students  (teacher’s  pet),  and  that  a  larger  part  of  students  who 
were  liked  by  the  teacher  were  also  liked  by  peers.  Similarly,  in 
our  study,  of  the  194  students  who  had  a  relatively  high  PRTL 
score  (at  least  one  SD  above  the  mean),  56  (28.9%)  also  had  a 
high  score  on  peer  liking,  but  29  (14.9%)  had  a  high  score  on 
peer  disliking.  For  future  research,  it  would  be  interesting  to 
investigate  the  determinants  of  being  in  the  “liked  by  all”  versus 
the  “teacher’s  pet”  category.  Possibly,  in  Hughes  et  al.’s 
younger  student  samples,  the  teacher’s  pet  phenomenon  was 
less  of  an  issue,  as  younger  students  seem  to  be  more  likely  to 
adopt  the  teacher’s  views  than  older  students  (Chang  et  al., 
2007).  Furthermore,  in  pre-adolescence  it  becomes  less  norma¬ 
tive  to  have  a  positive  relationship  with  the  teacher  (see  Es¬ 
posito,  1999;  Jerome  et  al.,  2009).  Therefore,  in  this  age  group, 
some  students  who  are  perceived  to  be  liked  by  the  teacher 
might  be  at  higher  risk  of  not  complying  to  the  norm,  and  being 
liked  less  as  a  result. 
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Limitations 

Our  findings  must  be  interpreted  in  the  light  of  certain  limita¬ 
tions.  First,  to  test  the  social  referencing  hypotheses,  our  model 
examined  the  association  between  teacher  behavior  and  students’ 
peer  perceptions.  However,  it  is  also  possible  that  associations  are 
bidirectional,  or  entirely  the  other  way  around.  For  example, 
teachers  may  behave  differently  toward  well-accepted  students 
than  toward  rejected  students  (see  De  Laet  et  al„  2014).  Further¬ 
more,  when  peers  dislike  a  student,  they  might  be  inclined  to 
mainly  view  behavior  that  is  in  agreement  with  their  negative 
evaluation  (Nickerson,  1998).  To  further  strengthen  the  evidence 
for  (or  against)  social  referencing,  it  would  be  interesting  to  test  the 
stability  of  the  constructs  and  bidirectional  associations  using  a 
cross-lagged  panel  model. 

Secondly,  both  the  mediators  and  outcomes  were  measured 
using  single  item  peer  nominations,  an  approach  that  introduces 
higher  susceptibility  to  measurement  error  than  using  multiple 
items.  However,  because  the  proportion  scores  came  from  multiple 
informants,  we  argue  that  this  has  not  distorted  the  findings  too 
much.  This  approach  also  raised  concerns  about  shared  method 
variance,  as  the  data  coming  from  the  same  participants  could 
partly  account  for  the  associations.  Nonetheless,  using  a  method 
different  from  asking  peers  to  get  to  know  peers’  perceptions  of  a 
student  would  have  impeded  the  validity  of  the  measures.  To  limit 
shared  method  variance,  we  designed  data  collection  on  mediating 
and  outcome  variables  at  different  time  points. 

One  of  the  major  strengths  of  the  current  study,  as  opposed  to 
several  earlier  studies  (e.g.,  Chang  et  al.,  2007;  De  Laet  et  al., 
2014;  Hughes  &  Chen,  201 1),  was  that  we  observed  actual  teacher 
behavior,  instead  of  relying  on  teacher  reports  as  a  proxy.  A 
drawback  of  this  method  is  that  teachers  could  have  behaved 
differently  than  when  no  camera  was  present  in  the  classroom.  As 
a  balance  to  this  concern,  we  were  not  present  in  the  classroom  to 
minimalize  intrusiveness,  and  when  asked  after  the  recording, 
most  teachers  mentioned  that  they  got  used  to  the  camera  very  fast. 
Another  drawback  of  observations  is  that  they  are  time  intensive 
and  necessarily  cover  only  a  limited  time  frame.  Some  students  did 
not  have  interactions  with  the  teacher  at  all,  and  the  representa¬ 
tiveness  of  the  coded  teacher  behavior  might  therefore  be  limited. 
Note,  however,  that  more  than  18,000  incidents  were  coded. 

Future  Directions  for  Social  Referencing  Research 

While  the  present  study  connected  actual  teacher  behavior  to 
peer  perceptions  of  the  teacher-student  relationship,  and  thereby 
has  provided  further  empirical  ground  for  social  referencing,  future 
research  can  undertake  several  steps  that  would  lead  to  deeper 
understanding  of  teacher  effects  on  students’  affective  peer  eval¬ 
uations.  First,  existing  research  has  mainly  focused  on  how  stu¬ 
dents  within  a  class  are  treated  differently  by  their  teacher.  Future 
research  could  also  incorporate  differences  between  classes,  re¬ 
garding  overall  positivity  or  negativity  in  the  teacher  s  behavior  or 
the  amount  of  differential  teacher  behavior.  For  instance,  in  classes 
where  teachers  in  general  treat  their  students  more  positively, 
students  in  general  may  like  each  other  better  (see  Gest  &  Rodkin, 
2011;  Hendrickx,  Mainhard,  Boor-Klip,  Cillessen,  &  Brekelmans, 
2016;  Hughes  et  al.,  2006),  but  students  who  do  receive  many 
negative  teacher  comments  might  also  stand  out  more  and  be  more 
strongly  disliked.  Thus,  research  on  social  referencing  could  ben¬ 


efit  from  employing  a  multi-level  approach  that  also  includes 
teacher-level  effects. 

Second,  social  referencing  theory  implies  that  an  individual 
classmate  witnesses  teacher  behavior  towards  a  student,  has  an 
impression  of  the  teacher’s  evaluation  of  the  student,  and  bases  his 
or  her  own  liking  of  the  student  on  this  impression.  Because  of  the 
focus  on  a  student’s  peer  reputations  in  the  entire  classroom  group, 
existing  research  cannot  yet  make  claims  about  such  intrapersonal 
processes.  Social  referencing  research  would  profit  from  adopting 
analysis  techniques  aimed  at  unraveling  the  antecedents  of  liking 
or  disliking  between  two  individual  students  (e.g.,  Snijders,  2001). 

Third,  in  the  current  as  well  as  prior  research  (McAuliffe  et  al., 
2009),  only  teachers’  verbal  behavior  was  examined.  Nonverbal 
teacher  behavior  directed  towards  a  student  is  likely  to  contain 
additional  cues  that  may  inform  peers’  evaluations  of  a  student  as 
well.  This  could  include  smiling  or  giving  a  student  a  thumbs-up 
on  the  positive  side,  and  frowning  or  raising  one’s  voice  on  the 
more  negative  side. 

Finally,  the  field  would  benefit  from  comparing  social  referenc¬ 
ing  processes  across  age  groups  so  that  we  can  pinpoint  the  stages 
in  development  in  which  peer  status  is  particularly  prone  to  the 
influences  of  teacher  behavior.  Several  studies  with  differing  age 
groups — ranging  from  6-year-olds  (Chang  et  al.,  2007;  Hughes  et 
al.,  2006)  to  adolescents  (Engels  et  al.,  2016) — have  resulted  in 
differing  findings,  but  differences  in  methods  make  interpretation 
difficult. 

Practical  Implications 

When  teachers  are  aware  of  the  possible  detrimental  impact  of 
their  behavior  toward  a  student  on  peer  disliking,  they  can  use  this 
information  to  interact  strategically  with  their  students.  Adapting 
teacher  behavior  in  public  interactions  with  individual  students 
may  also  be  an  avenue  to  improve  the  peer  status  of  students  in 
addition  to  social  or  behavioral  training  for  the  student  (e.g., 
Bierman  &  Powers,  2009).  From  our  findings,  the  most  important 
message  to  teachers  is  to  be  wary  of  publicly  communicating 
negative  affect  for  a  student,  and  that  investing  in  positive  inter¬ 
actions  with  disliked  students  is  probably  less  effective  than  de¬ 
creasing  the  amount  of  negative  interaction.  This  is  an  essential 
finding,  as  both  faking  positive  reactions  and  suppressing  negative 
ones  require  emotional  labor  (Glomb  &  Tews,  2004).  Intervention 
studies  targeting  teacher-student  interaction  (e.g.,  Reinke,  Lewis- 
Palmer,  &  Merrell,  2008;  Spilt,  Koomen,  Thijs,  &  Van  der  Leij, 
2012)  show  that  teachers,  particularly  when  provided  with  feed¬ 
back,  are  well  able  to  adjust  their  interactions  with  specific  stu¬ 
dents  to  become  less  conflicted  and  more  supportive.  However, 
withholding  negative  attention  may  not  always  seem  to  be  suitable, 
as  some  student  behaviors  can  be  strongly  undesirable,  or  even 
harmful,  and  need  to  be  stopped.  It  could,  therefore,  be  important 
to  invest  in  finding  ways  to  keep  negative  comments  subtle  and 
only  noticeable  to  the  student  in  question,  so  that  the  teacher  can 
correct  undesirable  behavior  without  drawing  too  much  attention. 
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Appendix 

Path  Analysis  Results  of  the  Basic  Social  Referencing  Model,  Predicting  Peer-Perceived  Teacher-Student 

Relationship,  and  Peer  Status 


Teacher-student  relationship 

Peer  status 

PRTL  T2 

PRTD  T2 

Peer  liking  T3 

Peer  disliking  T3 

Predictor 

B  ( SE) 

P 

B  ( SE) 

P 

B  ( SE) 

P 

B  ( SE) 

P 

Social  referencing  variables 
Positive  teacher  affective  T1 

.00  (.00) 

.00 

.00  (.00) 

-.00 

Negative  teacher  affective  T1 

-.00  (.00)* 

-.04 

.01  (.00)” 

.11 

Positive  teacher  cognitive  T1 

.00  (.00) 

.04 

-.00  (.00) 

-.02 

Negative  teacher  cognitive  T1 
PRTL  T2 

.00  (.00) 

-.00 

.01  (.00)* 

.06 

.03  (.04) 

.03 

-.00  (.03) 

.00 

PRTD  T2 

-.04  (.03) 

-.05 

.21  (.04)** 

.24 

Control  variables 

Prosocial  behavior  T1 

.37  (.04)’* 

.43 

-.08  (.03)** 

-.07 

.45  (.04)** 

.49 

-.24  (.03)** 

.26 

Overt  aggression  T1 

.02  (.03) 

.02 

.49  (.04)** 

.59 

-.04  (.03) 

-.06 

.17  (.04)** 

.23 

Relational  aggression  T1 

-.13  (.03)” 

-.14 

.12  (.03)** 

.10 

.00  (.03) 

.00 

.09  (.03)” 

.09 

Social  withdrawal  T1 

.12  (.03)” 

.12 

.03  (.03) 

.02 

-.16  (.02)** 

-.15 

.29  (.03)” 

.26 

Academic  achievement  T1 

.02  (.02) 

.04 

-.02  (.02) 

-.04 

-.03  (.01)*  ■  - 

-.06 

.01  (.02) 

.02 

Gender 

.07  (.01)” 

.31 

-.02  (.01)* 

-.07 

-.04  (.01)** 

-.15 

.03  (.01)** 

.10 

R2 

.43 

.62 

.31 

.48 

Note.  Model  fit  was  excellent,  x2(8)  =  9.57,  p  =  .297,  root  mean  square  error  of  approximation  (RMSEA)  =  .01,  comparative  fit  index  (CFI)  =  1.00, 
Tucker-Lewis  index  (TLI)  =  1.00,  standardized  root  mean  square  residual  (SRMR)  =  .01.  T1  =  Time  1;  T2  =  Time  2;  T3  =  Time  3;  PRTL  =  Peer 
Reputation  of  Teacher  Liking.  PRTD  =  Peer  Reputation  of  Teacher  Disliking. 

*  p  <  .05.  ”p<.01. 
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Extending  Attribution  Theory:  Considering  Students’  Perceived  Control  of 

the  Attribution  Process 


Evan  J.  Fishman 

Stanford  University 


Jenefer  Husman 
University  of  Oregon 


Research  in  attribution  theory  has  shown  that  students’  causal  thinking  profoundly  affects  their  learning 
and  motivational  outcomes.  Very  few  studies,  however,  have  explored  how  students’  attribution-related 
beliefs  influence  the  causal  thought  process.  The  present  study  used  the  perceived  control  of  the 
attribution  process  (PCAP)  model  to  examine  the  motivational  impact  of  these  beliefs.  PCAP  consists  of 
2  subconstructs:  perceived  control  of  attributions  (PCA),  which  refers  to  students’  perceived  capability 
to  influence  attributional  thought  and  awareness  of  motivational  consequences  of  attributions  (AMC), 
which  refers  to  students’  understanding  that  attributions  have  behavioral  and  psychological  conse¬ 
quences.  We  pursued  4  research  goals  and  found  evidence  to  support  the  following:  (a)  PCA  and  AMC 
are  structurally  independent  beliefs;  (b)  PCA  and  AMC  are  differentially  related  to  motivational 
outcomes;  (c)  levels  of  PCA  and  AMC  vary  significantly  between  controllable  and  uncontrollable  events; 
and  (d)  the  validity  of  the  PCAP  model  where  PCA  and  AMC  related  to  cognitive  reappraisal  strategies, 
which,  in  turn,  mediated  a  path  toward  an  adaptive  attribution  style,  autonomy,  and  subjective  well-being. 
Students  who  adopted  PCA  and  AMC  experienced  more  favorable  motivational  outcomes  than  students 
who  adopted  1  or  neither  of  the  beliefs.  The  results  suggest  that  these  attribution-related  beliefs  enhance 
the  quality  of  students’  causal  thinking  and  help  to  sustain  a  sense  of  autonomy  and  well-being. 

Keywords:  causal  attributions,  perceived  control,  attribution  theory,  student  motivation,  causal  thinking 


Whether  it  is  a  failed  midterm  or  social  rejection,  university 
students  frequently  encounter  stressful  events  that  elicit  causal 
thought — an  internal  investigation  about  what  caused  the  event. 
Research  in  attribution  theory  has  been  critical  in  understanding 
how  this  internal  investigation,  or  causal  search,  affects  student 
outcomes  (Graham  &  Weiner,  2012).  Research  in  attribution  the¬ 
ory  has  shown  that  students  often  engage  in  causal  thinking  and 
that  their  motivation  is  influenced  by  how  they  attribute  causes  to 
outcomes  (Weiner,  1985,  2010).  Still,  additional  work  is  needed  to 
understand  the  inherent  function  of  causal  attributions.  Many  at¬ 
tribution  theorists  contend  that  students  engage  in  causal  search  to 
acquire  or  sustain  a  sense  of  control  over  their  environment  fol¬ 
lowing  stressful  events.  These  stressful  outcomes  threaten  stu¬ 
dents’  sense  of  control;  thus,  a  search  for  causality  following  such 
outcomes  provides  a  sense  of  structure,  understandability,  and 
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predictability  of  their  environment  (Keinan  &  Sivan,  2001).  This 
sense  of  structure,  however,  does  not  create  long-term  positive 
outcomes  (Stupinsky,  Stewart,  Daniels,  &  Perry,  2011). 

These  findings  indicate  that  to  better  understand  the  impact  of 
students’  causal  thinking,  a  broader  approach  in  which  students’ 
personal  beliefs  are  considered  may  be  required.  In  fact,  Weiner 
(2010)  recognized  that  there  are  personal  variables,  such  as  beliefs  and 
orientations  that  precede  causal  thought.  In  other  words,  there  are 
factors  beyond  the  traditional  model  of  attribution  theory  that  con¬ 
tribute  to  whether  students’  causal  attributions  lead  to  positive  moti¬ 
vational  outcomes.  Some  have  argued  that  students’  perceived  capa¬ 
bility  to  determine  the  cause  of  an  event  may  be  one  such  factor. 
Tobin  and  Raymundo  (2010)  posited  that  students’  chronic  uncer¬ 
tainty  in  their  ability  to  determine  the  cause  of  an  outcome  leads  to 
more  maladaptive  attributions,  lack  of  primary  control,  and  well¬ 
being.  This  suggests  that  students  who  feel  capable  of  determining  the 
cause  of  outcomes  would  have  a  motivational  advantage  over  students 
who  lack  such  belief  and  that  perceived  control  of  the  attribution 
process  enhances  the  quality  of  students’  causal  thinking. 

In  this  study,  we  examined  whether  students’  perceived  control  of 
the  attribution  process  was  related  to  favorable  motivational  out¬ 
comes.  In  doing  this,  we  sought  to  enrich  the  study  of  causal  attribu¬ 
tions  by  considering  how  students’  personal  attribution-related  beliefs 
affect  their  causal  reasoning.  We  used  the  perceived  control  of  the 
attribution  process  (PCAP)  model  to  achieve  this  goal  (Fishman, 
2014b).  This  model  stems  from  traditional  constructs  of  perceived 
control;  however,  it  is  unique  in  that  it  represents  a  perceived  control 
of  an  internal  process,  rather  than  external  circumstances.  PCAP 
consists  of  two  subconstructs:  perceived  control  of  attributions 
(PCA),  which  refers  to  an  internal  locus  and  perceived  capability  to 
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influence  attributional  thought;  and  awareness  of  the  motivational 
consequences  of  attributions  (AMC),  which  refers  to  an  understand¬ 
ing  that  attributions  are  linked  to  psychological  and  behavioral  con¬ 
sequences.  These  beliefs  can  be  represented  as  naive  theories:  “ I'm 
the  one  who  determines  why  things  happen  and  those  determinations 
affect  me,”  respectively.  Although  these  beliefs  are  not  new  to  the 
human  condition,  they  have  remained  relatively  absent  from  educa¬ 
tional  research;  thus,  their  impact  on  student  motivation  is  unclear. 
The  present  study  took  an  empirical  approach  to  examine  the  moti¬ 
vational  implications  of  these  beliefs. 

The  Role  of  Causal  Attributions  in  the  Strive 
for  Control 

The  need  for  control  is  so  fundamental  it  has  been  described  as 
the  central  motive  that  guides  human  behavior  (Heckhausen  & 
Schulz,  1995).  Perceived  control  refers  to  an  internal  locus  and 
perceived  capability  to  influence  daily  events  (Thompson,  2002). 
Numerous  studies  have  demonstrated  the  importance  of  control 
beliefs  in  predicting  students’  achievement,  affect,  well-being,  and 
self-regulation  (e.g.,  Pekrun,  2006;  Perry,  Hladkyj,  Pekrun,  Clif¬ 
ton,  &  Chipperfield,  2005;  Perry,  Hladkyj,  Pekrun,  &  Pelletier, 
2001;  Schunk  &  Ertmer,  2000;  Shell  &  Husman,  2008;  Stupinsky, 
Perry,  Hall,  &  Guay,  2012).  Given  the  strong  evidence  that  per¬ 
ceived  control  is  advantageous,  it  is  not  surprising  that  students 
prefer  this  perspective  and  engage  in  several  internal  strategies  to 
maintain  it.  Researchers  have  found  that  those  who  lack  control 
will  endorse  governmental  influence  (Kay,  Gaucher,  Napier,  Cal- 
lan,  &  Laurin,  2008),  higher  powers  (Morling  &  Fiske,  1999),  and 
paranormal  abilities  (Greenaway,  Louis,  &  Hornsey,  2013)  and 
imagine  patterns,  develop  conspiracies,  and  create  superstitions  to 
regain  a  sense  of  control  (Whitson  &  Galinsky,  2008).  Conceptu¬ 
ally,  aligning  oneself  with  a  greater,  stable,  omnipresent  entity 
helps  students  to  avoid  feelings  of  chaos  and  randomness.  Roth- 
baum  and  colleagues  (1982)  theorized  that  individuals  engage  in 
these  secondary  (cognitive)  control  strategies  when  primary  con¬ 
trol  (perceived  control  of  the  environment)  is  lost.  The  authors 
suggested  that  these  actions  help  students  to  accept  a  situation  that 
is  beyond  control  or  “fit  in”  with  uncontrollable  circumstances 
(Morling  &  Evered,  2006). 

Causal  thinking  is  another  strategy  that  students  use  to  sustain  a 
perception  of  control.  The  attribution  process  is  frequently  triggered 
by  unexpected,  negative,  and/or  important  events  that  induce  stress 
and  threaten  control  beliefs  (Lazarus  &  Folkman,  1984;  Wong  & 
Weiner,  1981).  Early  attribution  theorists  postulated  that  students 
engage  in  the  attribution  process  to  make  sense  of  the  world  and 
regain  control  over  their  environment  (Heider,  1958;  Kelley,  1967). 
Weiner  (1985)  submitted  that  students’  desire  for  mastery  is  a  central 
motive  for  causal  exploration.  Keinan  and  Sivan  (2001)  found  that,  in 
stressful  situations,  those  who  desired  more  control  made  more  causal 
attributions.  Similarly,  students  who  are  deprived  of  control  are  more 
likely  to  engage  in  attributional  activity  (Pittman  &  Pittman,  1980).  If 
individuals  engage  in  attributional  processes  to  regain  a  sense  of 
control  over  their  lives,  why  do  so  many  students  suffer  motivation- 
ally  following  their  causal  explanations? 

Attribution  Theory 

Research  in  attribution  theory  has  explored  the  motivational 
consequences  of  such  causal  thought  (Weiner,  1985,  1991,  2000, 


2011).  Weiner’s  (2010)  attribution-based  theory  of  motivation 
demonstrates  how  different  causal  ascriptions  lead  to  different 
motivational  outcomes.  Students  will  attribute  their  outcomes  to  a 
variety  of  causes,  such  as  ability,  effort,  task  difficulty  and  luck 
(McClure  et  al„  201 1).  All  attributions  fall  into  at  least  one  of  three 
causal  dimensions:  locus,  stability,  and  controllability.  Locus  re¬ 
fers  to  the  location  of  a  cause,  whether  it  originated  from  an 
internal  or  external  source.  Stability  refers  to  the  duration  of  a 
cause,  whether  it  is  considered  lasting  (stable)  or  temporary  (un¬ 
stable).  Controllability  refers  to  the  degree  to  which  the  cause  can 
be  volitionally  altered.  For  example,  effort  is  often  considered 
internal,  unstable,  and  controllable,  whereas  ability  could  be  con¬ 
sidered  internal,  stable,  and  uncontrollable.  Each  of  these  causal 
dimensions  is  linked  to  specific  psychological  and  behavioral 
consequences.  Because  of  these  consequences,  some  attributions 
are  motivationally  adaptive,  whereas  some  are  maladaptive.  For 
example,  if  a  student  discovers  that  he  was  not  accepted  to  his 
college  of  choice,  he  might  attribute  the  outcome  to  a  lack  of 
intelligence  (internal  and  uncontrollable  attribution).  There  is 
strong  evidence  that  this  is  a  maladaptive  attribution  as  his  expec¬ 
tancy  for  future  success  and  affect  is  likely  to  suffer  from  this 
explanation.  Or,  he  might  attribute  the  outcome  to  a  bad  applica¬ 
tion  strategy  (internal  and  controllable  attribution).  There  is  strong 
evidence  that  this  attributional  pattern  would  lessen  negative  affect 
and  support  his  motivation  to  persist,  thus  it  is  considered  an 
adaptive  attribution  (Cleary  &  Zimmerman,  2001). 

Hence,  attribution  theory  illustrates  that  not  all  causal  thought  is 
motivationally  beneficial.  Even  though  students  may  repeatedly 
experience  poor  outcomes  because  of  their  attributional  process,  it 
is  unlikely  that  they  will  change  their  attributional  thinking  be¬ 
cause  the  process  occurs  automatically  and  is  driven  by  past 
experiences  and  existing  thought  patterns  (Weiner,  2010).  If  stu¬ 
dents  do  not  exercise  control  of  their  causal  reasoning,  they  may 
not  be  able  to  avoid  making  maladaptive  attributions  that  hinder 
their  motivation.  Although  it  may  not  be  accessed  by  every  stu¬ 
dent,  we  argue  that  there  are  personal  beliefs  that  support  students’ 
ability  to  control  their  attribution  process,  increasing  the  likelihood 
of  adaptive  attributional  outcomes. 

Weiner  (2000)  recognized  that  attributions  are  influenced  by 
causal  antecedents  such  as  past  history,  social  norms,  hedonic 
biasing,  and  so  forth.  Similarly,  students  who  endorse  performance 
goals  tend  to  make  more  attributions  to  ability  than  do  students 
who  endorse  mastery  goals  (Shell  &  Husman,  2008;  Nelson,  Shell, 
Husman,  Fishman,  &  Soh,  2015;  Wolters,  Fan,  &  Daugherty, 
2013).  One  line  of  research  has  shown  that  students’  causal  attri¬ 
butions  are  dictated  by  their  implicit  beliefs  of  intelligence  (e.g., 
Dweck,  Mangels,  Good,  Dai,  &  Sternberg,  2004;  Hong,  Chiu, 
Dweck,  Lin,  &  Wan,  1999).  Given  that  students’  beliefs  and 
orientations  influence  their  causal  thinking,  surprisingly  few  stud¬ 
ies  have  examined  how  students’  attribution-related  beliefs  might 
influence  their  causal  thinking.  There  is  reason  to  believe  that  if 
students  felt  capable  to  influence  their  attribution  process,  they 
would  be  more  likely  to  do  just  that,  which  would,  in  turn,  allow 
them  to  more  easily  thwart  the  negative  motivational  consequences 
of  maladaptive  causal  thinking. 

Students  can  be  taught  to  embrace  more  adaptive  attribution-related 
perspectives.  Perry  and  colleagues  (2014)  found  that  struggling  col¬ 
lege  students  who  attend  attributional  retraining  (AR)  programs  can 
make  significant  improvements  in  academic  achievement,  persis- 
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tence,  and  perceptions  of  control  (Haynes,  Ruthig,  Perry,  Stupnisky, 
&  Hall,  2006).  These  interventions  help  students  to  recognize  the 
difference  between  adaptive  and  maladaptive  attributions.  Although 
these  ARs  have  been  successful  for  producing  positive  student  out¬ 
comes,  the  underlying  mechanism  by  which  this  occurs  is  unclear. 
Some  might  argue  that  by  attending  these  ARs  students  become  aware 
of  their  own  causal  thinking  and  learn  that  they  control  this  process, 
which  may  be  a  dominant  reason  for  the  students’  gains.  Further 
research  is  needed  to  investigate  this  claim. 

Perceived  Control  of  the  Attribution  Process  (PCAP) 

The  PCAP  model  was  used  to  explore  how  students’  attribution- 
related  beliefs  might  affect  their  motivational  outcomes.  The 
PCAP  model  posits  that  students  who  perceive  control  of  attribu¬ 
tions  and  are  aware  of  the  motivational  consequences  of  attribu¬ 
tions  are  more  likely  to  experience  autonomy  and  well-being  (see 
Fishman,  2014b).  Students  who  perceive  control  of  their  attribu¬ 
tions  feel  that  they  are  the  ones  who  ultimately  control  the  expla¬ 
nation  of  an  event.  In  other  words,  they  believe  it  is  “up  to  them” 
to  determine  why  things  happen.  We  propose  that  PCA  does  not 
reflect  a  set  of  strategies,  but,  rather,  an  underlying  control  belief 
that  allows  students  to  retain  the  motivational  benefits  of  primary 
control.  We  based  this  argument  on  prior  research  that  suggests  the 
cognitive  strategies  that  students  use  to  regain  control  should  not 
fall  under  the  label  of  control  because  they  stem  from  “abilities”  or 
“strivings,”  rather  than  beliefs  (Skinner,  2007).  Single  strategies 
for  coping  with  a  stressful  event  (e.g.,  downgrading  expectations) 
do  not  fall  under  the  umbrella  of  a  “control  belief,”  rather  control 
beliefs  frame  the  entire  attribution  process. 

Additionally,  we  argue  that  students  who  are  aware  that  their 
attribution  for  an  event  will  affect  them  motivationally  are  more 
likely  to  see  the  value  in  causal  thinking  and  focus  on  making 
adaptive  attributions.  That  is,  they  understand  that  their  explana¬ 
tion  for  an  event  will  influence  how  they  react  to  it.  These  beliefs 
are  considered  meta-cognitive  because  they  reflect  a  higher-order 
cognitive  process. 

According  to  the  model,  PCA  and  AMC  allow  students  to  more 
easily  disengage  from  the  attribution  process  and  circumvent  the 
negative  motivational  consequences  of  maladaptive  attributions. 
These  beliefs  promote  cognitive  strategies  that  help  students  reg¬ 
ulate  their  causal  thinking  which  subsequently  leads  to  a  more 
adaptive  attribution  style  (see  Figure  1).  For  example,  if  a  student 
fails  a  course,  she  is  likely  to  engage  in  causal  search.  If  there  is  no 
obvious  reason  for  this  outcome,  she  is  likely  to  consider  many 
causes  (Forsyth,  Story,  Kelley,  &  McMillan,  2009;  Weiner,  1985), 
some  of  which  are  motivationally  adaptive  (e.g.,  “I  didn’t  try  hard 
enough”),  and  some  of  which  are  maladaptive  (e.g.,  “I’m  not  smart 
enough”).  Typically,  her  causal  thinking  would  be  executed  be¬ 
yond  awareness,  driven  by  the  specific  circumstances  of  the  event 
and  the  causal  antecedents  she  has  developed  over  time.  However, 
if  she  becomes  aware  of  her  attributional  process  and  believes  that 
it  is  “up  to  her”  to  frame  her  thinking  about  why  she  failed  the 
course,  she  is  more  likely  to  disengage  from,  or  exercise  control  of, 
the  process.  This  cognitive  intervention  may  come  in  the  form  of 
reappraisal  (e.g.,  “I  don’t  know  why  I  failed,  but  because  it  s  up  to 
me  I  won’t  assume  the  worst”)  or  behavioral  disengagement  (e.g., 
“I  don’t  know  why  I  failed,  but  because  it’s  up  to  me  I  won’t  worry 
about  it”;  Carver,  Scheier,  &  Weintraub,  1989).  These  cognitive 


Figure  1.  A  conceptual  model  of  the  perceived  control  of  the  attribution 
process.  PCA  =  perceived  control  of  attributions;  AMC  =  awareness  of 
the  motivational  consequences  of  attributions. 


strategies  are  said  to  occur  when  the  cause  of  the  event  is  unknown 
or  ambiguous.  The  dotted  lines  in  Figure  1  represent  the  path  when 
cognitive  strategies  are  used. 

If  there  is  an  obvious  cause  for  her  failure  (e.g.,  absent  through¬ 
out  the  semester)  her  causal  search  is  not  elaborate  and  these 
cognitive  reappraisal  strategies  are  not  necessary.  That  is,  students 
are  not  likely  to  engage  in  strategies  to  change  their  causal  expla¬ 
nations  if  the  cause  is  obvious.  After  all,  the  impetus  of  causal 
search  is  to  understand  the  reality  of  a  situation  (Heider,  1958). 
However,  the  negative  event  is  likely  accompanied  by  additional 
questions  in  which  her  PCAP  beliefs  are  advantageous  (e.g.,  “Why 
did  I  miss  so  many  classes?”;  Why  didn’t  I  get  support  from  my 
teacher?”)  and  in  which  these  cognitive  reappraisal  strategies  can 
be  used.  Thus,  viewing  PCAP  with  respect  to  a  single  event  does 
not  fully  capture  the  potential  impact  of  the  beliefs.  A  pilot  study 
by  Fishman  (2014a)  found  that  general-PCA  and  AMC  are  more 
strongly  linked  to  autonomy  than  event-specific  versions  of  the 
beliefs.  The  study  also  showed  that  students  were  more  likely  to 
perceive  control  of  attributions  for  controllable  events  (e.g.,  failed 
test)  midterm  than  uncontrollable  events  (e.g.,  pop  quiz).  This 
makes  sense,  as  attributions  are  intrinsically  tied  to  the  event  for 
which  they  are  made  (Bemtsen  &  Rubin,  2006).  Because  causal 
attributions  cannot  be  disassociated  from  the  originating  events 
implies  that  attribution-related  beliefs  are  equally  tied  to  those 
events,  suggesting  that  general  PCAP  is  a  more  holistic  and  accu¬ 
rate  representation  of  students’  PCA  and  AMC  on  average.  Hence, 
the  PCAP  beliefs  are  best  conceptualized  as  general. 

Importantly,  attribution  theorists  have  discussed  how  causal 
thought  occurs  not  only  for  self-directed  outcomes,  but  also  for 
other-directed  outcomes  (Weiner,  2000)  or  others’  behavior 
(Heider,  1958).  This  is  especially  relevant  in  the  social  context  of 
school  where  dysfunctional  interactions  with  teachers  or  class¬ 
mates  can  have  a  negative  impact  on  students’  achievement  and 
well-being  (Kaplan,  Liu,  &  Kaplan,  2005;  Weiner,  1994).  In  fact, 
social  stressors  may  be  more  frequent  and  impactful  than  academic 
stressors,  especially  for  adolescent  students  (Teicher,  Samson, 
Sheu,  Polcari,  &  McGreenery,  2010).  We  propose  that  perceived 
control  of  the  attribution  process  is  relevant  in  the  social  and 
learning  contexts  of  school.  Students  who  implicitly  believe  that  it 
is  “up  to  them”  to  determine  why  an  event  occurred  and  under- 
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stand  that  such  a  determination  will  affect  them,  will,  on  average, 
find  themselves  better  equipped  to  deal  with  stressful  social  events. 
For  instance,  some  social  interactions  can  have  detrimental  effects 
on  students’  affect,  behavior,  and  learning  (e.g.,  “Why  didn’t 
Jimmy  ask  me  to  prom?”).  A  student’s  causal  thinking  about  these 
events  can  exacerbate  the  negative  effects  (e.g.,  “He  thinks  I’m 
unattractive”)  or  reduce  them  (e.g.,  “He’s  too  shy”).  But,  if  the 
student  feels  capable  of  influencing  the  process,  she  is  more 
inclined  to  disengage  and  express  autonomy  (e.g.,  “It’s  up  to  me  to 
decide  why  he  didn’t  ask,  so  I  won’t  assume  the  worst”). 

The  PCAP  model  includes  autonomy  as  an  outcome,  rather  than 
as  primary  control,  because  these  beliefs  are  proposed  to  engender 
feelings  of  agency.  Theoretically,  because  PCAP  represents  a  type 
of  perceived  control,  students  need  for  competence  is  satisfied 
which  allows  for  autonomous  strivings  (Ryan  &  Deci,  2000).  As 
proposed  in  the  PCAP  model,  PCA  and  AMC  allow  students  to 
more  easily  circumvent  the  negative  motivational  consequences  of 
the  attribution  process.  Thus,  students  who  perceive  control  of 
their  causal  thinking  may  feel  a  greater  sense  of  self-authorship;  an 
internal  locus  that  supports  self-governance. 

The  Present  Study 

The  primary  goal  of  this  study  is  to  contribute  to  the  field  of 
attribution  theory  by  considering  the  impact  of  students’  perceived 
control  of  the  attribution  process.  We  argue  that  students’ 
attribution-related  control  beliefs  add  an  important  dimension  to 
the  study  of  causal  thinking  and  explores  a  unique  extension  of 
attribution  theory.  We  also  sought  to  fill  gaps  in  the  literature 
regarding  the  function  of  attributional  thought  following  control- 
threatening  events.  We  examined  college  students’  perceived  con¬ 
trol  of  the  attribution  process  across  both  social  and  academic 
domains.  Given  that  students  operate  in  a  multifaceted  environ¬ 
ment,  we  propose  that  this  study  provides  implications  for  stu¬ 
dents’  holistic  academic  experience. 

This  study  pursued  four  research  goals  regarding  the  validity 
and  motivational  implications  of  PCAP.  The  first  goal  was  to 
determine  whether  perceived  control  of  attributions  and  awareness 
of  the  motivational  consequences  of  attributions  are  independent 
constructs.  Both  PCA  and  AMC  are  considered  important  yet 
distinct  aspects  of  PCAP.  Although  a  student  may  feel  capable  of 
determining  why  he  failed  a  test,  for  example,  he  may  not  under¬ 
stand  how  his  determination  will  affect  how  he  feels  or  behaves 
following  the  test.  The  second  goal  was  to  demonstrate  that  PCA 
and  AMC  relate  to  important  motivational  constructs  and  that  they 
are  differentially  related  to  constructs.  This  was  intended  to  further 
validate  the  PCAP  beliefs  and  provide  evidence  for  their  indepen¬ 
dence.  The  third  goal  was  to  determine  (a)  whether  the  subjective 
controllability  of  an  event  affected  levels  of  PCA  or  AMC  and,  if 
so,  (b)  whether  PCAP  would  positively  relate  to  outcomes  when 
controlling  for  this  context.  That  is,  we  explored  whether  levels  of 
students’  PCA  and  AMC  varied  between  controllable  and  uncon¬ 
trollable  events  and  examined  whether  perceived  control  of  the 
attribution  process  functions  as  a  means  for  students  to  maintain 
motivation  in  situations  of  uncontrollability.  The  fourth  and  final 
goal  was  to  examine  the  motivational  implications  of  PCAP.  In 
alignment  with  the  PCAP  model,  we  focused  on  four  motivational 
outcomes:  autonomy,  subjective  well-being,  cognitive  reappraisal, 
and  attribution  style.  We  used  Ryan  and  Deci’s  (2006)  operation¬ 


alization  of  autonomy,  which  refers  to  self-governance  or  rule  by 
self.  We  operationalized  subjective  well-being  as  a  composite 
score  of  life  satisfaction  and  affect.  Attribution  style  was  assessed 
using  the  Attributional  Style  Questionnaire  (ASQ;  Peterson  et  al., 
1982).  Cognitive  reappraisal  was  operationalized  as  positive  rein¬ 
terpretation  (Carver  et  al.,  1989).  According  to  the  authors,  posi¬ 
tive  reinterpretation  reflects  one’s  tendency  to  construe  stressful 
events  in  a  positive  way.  This  represents  a  tendency  to  cognitively 
intervene  following  stressful  events,  which  is  a  key  component  of 
the  PCAP  model.  The  fourth  goal  also  involved  validating  the 
variable  sequence  of  the  PCAP  model. 

Method 

Participants  and  procedures.  A  total  of  800  students  partic¬ 
ipated  in  this  study.  The  students  were  drawn  from  a  participant 
pool  administered  by  the  school  of  education  at  a  large  Southwest¬ 
ern  university.  Students  from  large  social  sciences  courses  at  the 
university  were  allowed  to  participate  if  permitted  by  their  instruc¬ 
tor.  College  students  were  chosen  as  participants  because  devel- 
opmentally  they  are  more  likely,  than  are  younger  students,  to  have 
explored  and  had  experience  with  meta-cognitive  thought.  Inci¬ 
dentally,  such  beliefs  are  likely  developed  and  relevant  in  their 
academic  and  motivational  outcomes.  The  majority  of  participants 
were  female  (76%).  The  sample  was  primarily  Caucasian  (61%), 
with  16%  Hispanic/Latino,  9%  Asian,  8%  Biracial,  4%  African 
American,  and  2%  American  Indian/ Alaska  Native.  The  majority 
of  the  students  were  undergraduates  (97%).  The  age  of  the  partic¬ 
ipants  ranged  from  18  to  59;  55%  were  18  to  22  years  old,  and  the 
median  age  was  22. 

Students  who  chose  to  participate  in  the  study  were  given  a 
website  link  that  directed  them  to  the  online  survey.  The  survey 
included  all  study  measures,  and  each  student  took  the  survey  one 
time  during  the  spring  semester.  On  the  basis  of  instructor  prefer¬ 
ence,  those  who  completed  the  self-report  survey  were  either  given 
course  credit  or  a  $4  gift  card  to  a  large  online  retail  vender.  The 
online  survey  took  approximately  35  min  to  complete.  The  study 
was  approved  by  the  institutional  review  board  at  the  university, 
and  all  participants  consented  before  taking  the  survey. 

Analysis  plan.  The  four  research  goals  guiding  this  study 
required  different  analytic  approaches;  thus,  we  organized  the 
results  and  discussion  around  the  primary  research  goals.  All 
analyses  were  conducted  on  the  student  sample  described  in  the 
preceding  section;  however,  to  maximize  the  sample,  we  randomly 
selected  three  unique  subsamples  with  which  to  conduct  analyses. 
We  refer  to  these  as  Subsamples  1  (n  =  286),  2  (n  =  272),  and  3 
(n  =  242).  The  analytic  plan,  sample  used,  and  expected  results  for 
each  goal  are  described  in  the  following  paragraphs. 

Goal  1.  To  determine  whether  perceived  control  of  attribu¬ 
tions  and  awareness  of  motivational  consequences  of  attributions 
are  distinct  constructs,  we  conducted  three  separate  factor  analy¬ 
ses,  each  on  a  unique  portion  of  the  sample.  The  first  exploratory 
factor  analysis  (EFA)  was  used  to  reduce  the  initial  pool  of  30 
PCAP  scale  items  to  1 1  (Subsample  1).  A  second  EFA  (Subsample 
2)  and  a  confirmatory  factor  analysis  (CFA;  Subsample  3)  were 
conducted  on  this  1 1-item  scale.  We  expected  to  find  evidence  for 
two  separate  constructs  (PCA  and  AMC). 

Goal  2.  To  demonstrate  that  PCA  and  AMC  related  to  other 
measures  as  expected  and  that  they  are  differentially  related  to 
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measures,  a  number  of  hierarchical  regression  analyses  were  con¬ 
ducted  in  which  both  constructs  were  regressed  on  similar  or 
dissimilar  measures.  We  used  a  regression  approach  to  assess 
unique  variance  accounted  for  by  PCA  and  AMC  in  the  dependent 
variables.  These  analyses  were  conducted  on  the  first  random  third 
of  the  data  (Subsample  1).  The  specific  hypothesized  outcomes  are 
detailed  in  the  Goal  2  section. 

Goal  3.  The  third  research  goal  was  twofold:  to  determine 
whether  the  subjective  controllability  of  an  event  affected  stu¬ 
dents’  PCA  or  AMC  and,  if  so,  to  determine  whether  that  effect 
had  any  impact  on  how  the  beliefs  related  to  outcomes.  Because 
this  analysis  examined  the  influence  of  situational  context  on 
levels  of  PCA  and  AMC,  an  event-specific  measure  was  required 
to  gather  information  on  the  subjective  controllability  of  specific 
events.  We  used  the  Event-Specific  Perceived  Control  of  the 
Attribution  Process  Scale  (ES-PCAPS)  that  was  developed  in  a 
prior  study  (Fishman,  2014a;  see  Goal  3  section). 

First,  we  conducted  a  CFA  to  examine  the  convergence  of  the 
event-specific  and  general  versions  of  PCA  and  AMC.  We  then  ran 
a  one-way  analysis  of  variance  (ANOVA)  comparing  levels  of 
PCA  and  AMC  between  subjectively  controllable  and  uncontrol¬ 
lable  events.  Lastly,  we  conducted  a  series  of  regression  analyses 
to  assess  the  amount  of  variance  in  autonomy,  well-being,  attribu¬ 
tion  style  and  cognitive  reappraisal  was  accounted  for  by  PCA, 
AMC,  and  PCA X AMC  when  controlling  for  students’  reported 
controllability  of  the  event. 

We  expected  that  the  controllability  of  the  event  would  signif¬ 
icantly  affect  levels  of  PCA  and  AMC.  Also,  it  was  expected  that 
the  event-specific  PCAP  beliefs  would  relate  to  autonomy,  well¬ 
being,  cognitive  reappraisal,  and  an  adaptive  attribution  style  when 
controlling  for  students’  reported  controllability  of  the  event. 
These  analyses  used  the  entire  sample  ( N  =  800). 

Goal  4.  The  fourth  research  goal  was  to  examine  the  motiva¬ 
tional  implications  of  PCAP  and  the  validity  of  the  PCAP  model. 
To  achieve  this  goal,  we  conducted  two  analyses  using  the  entire 
student  sample.  First,  we  examined  how  PCA  and  AMC  were 
associated  with  autonomy,  subjective  well-being,  cognitive  reap¬ 
praisal,  and  attribution  style  using  the  same  hierarchical  regression 
approach  as  in  Goals  2  and  3.  Goal  4  used  the  general  PCAP 
beliefs  as  independent  variables,  as  opposed  to  Goal  3  which 
required  an  event-specific  PCAP  measure.  Second,  we  analyzed  a 


structural  equation  model  that  reflected  the  conceptual  PCAP 
model  and  used  the  Sobel  (1982)  test  to  assess  mediated  paths. 

We  expected  that  both  PCA  and  AMC  would  predict  self- 
reported  cognitive  reappraisal,  adaptive  attribution  style,  auton¬ 
omy,  and  well-being.  We  hypothesized  that  the  PCA X AMC  in¬ 
teraction  would  account  for  unique  variance  in  cognitive 
reappraisal  because  students  who  feel  capable  of  influencing  their 
attributions  (PCA)  and  are  aware  of  how  those  attributions  affect 
them  (AMC)  are  in  a  stronger  meta-cognitive  position  that  affords 
opportunity  and  motivational  incentive  to  cognitively  intervene, 
compared  to  students  who  endorse  only  one  of  the  beliefs.  Addi¬ 
tionally,  the  empirical  model  that  represents  the  conceptual  PCAP 
model  was  expected  to  fit  the  data.  The  anticipated  results  of  the 
specific  relationships  within  this  model  are  detailed  in  the  Goal  4 
section. 

Goal  1:  Determine  Whether  PCA  and  AMC  Are 
Independent  Constructs 

Instruments. 

Perceived  Control  of  the  Attribution  Process  Scale  (PC APS). 
To  measure  PCAP,  we  developed  an  instrument  that  assesses 
students’  general  PCA  and  AMC  beliefs.  The  development  of  this 
scale  was  a  continuation  of  the  items  generated  in  a  pilot  study  by 
Fishman  (2014a).  This  pilot  study  generated  initial  scale  items  and 
explored  suitable  measurement  formats.  The  initial  pool  of  items 
was  evaluated  by  six  content  experts.  Four  of  them  were  either 
educators  or  researchers  in  education,  one  was  an  expert  in  law  and 
decision  making,  and  one  was  an  expert  in  human  development. 
They  rated  the  items  on  clarity  of  statement  and  the  quality  of  fit 
in  its  intended  subscale  and  were  asked  to  classify  the  items  in 
accordance  with  which  subscale  it  was  designed  to  measure.  Based 
on  the  experts’  feedback,  items  were  revised  and  eliminated.  This 
resulted  in  a  30-item  scale  that  was  ultimately  reduced  to  1 1  -items 
following  scale  development  procedures  (described  subsequently). 

The  resulting  11 -item  scale  is  presented  in  Table  1.  For  each 
item,  students  rated  their  agreement  on  a  6-point  Likert  scale  (1  = 
strongly  disagree,  2  =  disagree,  3  =  somewhat  disagree,  4  = 
somewhat  agree,  5  =  agree,  6  =  strongly  agree).  Participants 
were  given  the  following  instructions: 


Table  1 

Pattern  Matrix  Factor  Loadings  of  the  PCAPS  Items  (PCA  and  AMC  Sub  scales) 


Item 


PCA  AMC 


7.  The  reasons  why  things  happen  in  my  life  are  for  me  to  decide. 

11.  Ultimately,  I’m  the  one  who  determines  why  things  happen. 

1.  I  have  control  over  determining  why  things  happen  in  my  life. 

5.  I  have  a  great  deal  of  control  over  determining  why  events  happen. 

9.  Whether  or  not  something  happened  for  a  greater  reason  is  for  me  to  decide. 

3.  Whether  or  not  I  caused  an  event  is  ultimately  my  decision. 

2.  My  feelings  about  an  event  depend  on  my  thoughts  about  the  event. 

10.  My  thoughts  about  what  caused  an  event  will  influence  how  I  react  to  it. 

4.  The  reasons  I  give  for  what  happens  in  my  life  affect  how  I  feel  and  what  I  do  about  it. 

6.  Changing  my  mind  about  what  caused  a  situation  can  change  how  I  react  to  it. 

8.  When  I  fail  at  something,  my  feelings  about  it  depend  on  why  it  happened. _ 


.80 

-.01 

.79 

-.02 

.68 

.06 

.66 

-.07 

.60 

.04 

.54 

.04 

-.09 

.88 

-.02 

.80 

.11 

.63 

.00 

.54 

.02 

.47 

Note.  PCAPS  =  Perceived  Control  of  the  Attribution  Process  Scale;  PCA  =  perceived  control  of  attributions: 
AMC  =  awareness  of  the  motivational  consequences  of  attributions.  Numbers  to  the  left  of  the  items  indicate 
their  order.  Bolded  text  indicates  the  items  intended  factor. 
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Sometimes  when  things  happen  we  think  about  why  they  happened. 
The  following  statements  have  to  do  with  your  life  IN  GENERAL. 
Use  the  scale  to  indicate  how  much  you  agree  or  disagree  with  each 
statement. 

Results 

First  EFA.  An  EFA  was  conducted  with  all  30  items.  It  was 
performed  on  a  random  third  (approximate)  of  the  sample  (Sub¬ 
sample  1)  using  principle  axis  factoring  and  a  direct  oblimin 
(oblique)  rotation.  The  two  strongest  factors  represented  the  ex¬ 
pected  PCA  and  AMC  constructs.  The  first  factor  accounted  for 
33.7%  of  the  common  variance,  whereas  the  second  factor  ac¬ 
counted  for  7.4%  of  the  common  variance.  Via  the  procedures 
discussed  by  Dawis  (2000),  we  used  the  following  item-selection 
criteria  to  reduce  the  PCAPS  from  30  to  1 1  items.  First,  items  were 
required  to  load  above  4.0  on  their  intended  factor  without  cross¬ 
loading  above  3.2  on  the  other  factor  (Tabachnick  &  Fidell,  2001). 
Second,  the  contribution  of  each  item  to  the  reliability  of  the 
subscale  was  assessed  using  Cronbach’s  alpha  corrected  item-total 
correlation.  Items  that  contributed  most  to  the  reliability  of  the 
subscale  were  favored.  Lastly,  items  that  correlated  most  strongly 
to  their  relevant  convergent  and  discriminant  validity  measures 
were  retained;  this  was  determined  by  looking  at  the  zero-order 
correlations  of  each  item.  This  iterative  reduction  process  was 
repeated  until  all  items  met  each  of  these  criteria.  The  result  was 
the  1  l-item  (six  PCA,  five  AMC)  measure.  A  minimum  of  four  to 
five  items  per  subscale  is  suggested  to  achieve  adequate  internal 
consistency  (Dawis,  2000). 

Second  EFA.  A  second  EFA  was  conducted  using  Subsample 
2.  In  addition,  tests  were  used  to  determine  the  number  of  factors 
to  extract.  The  scree  plot  indicated  the  presence  of  two  factors 
underlying  the  11  items,  as  did  the  minimum  average  partial 
(MAP)  test  (Velicer,  1976).  A  parallel  analysis  (O’Connor,  2000) 
based  on  random  data  generation  also  suggested  extracting  two 
factors.  The  two-factors  accounted  for  47%  of  the  common  vari¬ 
ance  (see  Table  1  for  items  and  factor  loadings). 

CFA.  To  further  examine  the  two-factor  solution,  we  con¬ 
ducted  a  CFA  on  the  last  random  third  of  the  sample  (Subsample 
3).  Mplus  6  (Muthen  &  Muthen,  2012)  was  used  for  this  analysis. 
A  robust  maximum  likelihood  estimator  was  used  to  adjust  for 
non-normality  and  accommodate  the  data  that  were  missing  at 
random.  The  fit  of  each  model  was  evaluated  using  the  chi-square 
significance  test,  the  root  mean  square  error  of  approximation 
(RMSEA),  the  comparative  fit  index  (CFI),  the  Tucker— Lewis 


Index  (TLI),  and  the  standardized  root-mean-square  residual 
(SRMR).  The  cut-off  criteria  suggested  by  Hu  and  Bentler  (1999) 
were  used  as  a  means  to  determine  quality  of  fit  (i.e.,  CFI  &  TLI  > 
.95,  SRMR  <  .08,  RMSEA  <  .06).  To  determine  whether  a  single 
factor  was  underlying  the  PCA  and  AMC  items,  we  tested  a 
one-factor  solution  as  an  alternative.  This  one-factor  solution  did 
not  adequately  fit  the  data  (see  Table  2  for  model  fit  indices).  The 
fit  indices  showed  a  marked  improvement  with  the  addition  of  the 
second  factor.  The  modification  indices  indicated  that  it  may  be 
necessary  to  allow  items  AMC2  and  AMC  10  to  covary.  The 
similar  wording  of  these  items  warranted  this  modification.  A 
chi-square  difference  test  (Satorra  &  Bentler,  2001)  showed  that 
the  respecified  model  was  a  significantly  better  fit  of  the  data, 
Ax2(1)  =  12.67,  p  <  .001.  This  and  the  overall  fit  indices  suggest 
that  the  respecified  two-factor  model  is  preferred. 

Psychometric  properties  of  the  PCA  and  AMC  subscales. 
The  means  and  standard  deviations  of  the  PCA  and  AMC  sub¬ 
scales  were  comparable  across  the  three  unique  samples  (see  Table 
3).  Mean  scores  for  AMC  were  consistently  higher  than  scores  for 
PCA.  The  Cronbach’s  alpha  was  used  to  assess  reliability  for  both 
subscales  in  each  subsample.  PCA  was  consistent  at  .84  in  each 
sample,  whereas  AMC  ranged  from  .78  to  .80.  Both  subscales 
were  stable  across  samples  and  demonstrated  respectable  levels  of 
reliability  (DeVellis,  2003). 

PCA  and  AMC  were  significantly  (p  <  .05)  correlated  in  each 
of  the  three  samples  (Subsample  1,  r  =  .43;  Subsample  2,  r  =  .38; 
Subsample  3,  r  =  .36).  This  relationship  is  not  surprising  given 
that  both  the  PCA  and  AMC  items  represent  meta-cognitive  beliefs 
of  causality.  It  is  common  for  subconstructs  to  covary  because  they 
collectively  represent  a  higher  order  variable  (e.g.,  Lim  &  Chap¬ 
man,  2015).  In  this  case,  PCA  and  AMC  reflect  a  specific  aspect 
of  students’  internal  phenomena.  Despite  the  moderate  correlation, 
the  results  suggest  that  PCA  and  AMC  are  separate  constructs. 

Discussion — Goal  1.  The  first  goal  was  to  determine  whether 
perceived  control  of  attributions  and  awareness  of  the  motivational 
consequences  of  attributions  are  separate  factors.  In  each  analysis, 
the  two-factor  structure  (PCA  and  AMC)  was  supported.  The  CFA 
demonstrated  that  a  two-factor  model  was  a  better  fit  of  the  data 
than  a  one-factor  model.  This  provides  substantial  evidence  for  the 
structure  of  PCAP  and  suggests  that  PCA  and  AMC  are  indepen¬ 
dent  constructs.  Both  the  PCA  and  AMC  subscales  demonstrated 
respectable  internal  consistency  in  three  separate  samples.  Given 
that  both  subscales  were  reliable  in  all  three  subsamples  provides 
strong  evidence  for  the  reliability  of  PCAPS. 


Table  2 

Goodness-of-Fit  Indices  for  CFA  Models,  PCAPS 


Model 

x2 

df 

CFI 

TLI 

SRMR 

RMS^A  (90%  Cl) 

AIC 

One-factor  model 

293.27* 

44 

.621 

.526 

.124 

.155  [.138,  .172] 

7,917.93 

Two-factor  model 

93.36* 

43 

.923 

.902 

.048 

.070  [.051,. 090] 

7,667.35 

Two-factor  model  (AMC  2  with  AMC  10) 

61.78* 

42 

.970 

.961 

.042 

.045  [.016,  .067] 

7,627.85 

Note.  CFI  =  comparative  fit  index;  TLI  =  Tucker-Lewis  Index;  SRMR  =  standardized  root  mean  square  residual;  RMSEA  =  root  mean  square  error 
of  approximation;  90%  Cl  =  confidence  interval  for  RMSEA;  AIC  =  Akaike  information  criterion;  AMC  =  awareness  of  the  motivational  consequences 
of  attributions.  (Respecifications  to  the  previous  model  are  parenthesized  underneath  the  model  name;  AMC  2  and  AMC  10  refer  to  the  item  numbers 
identified  in  Table  1). 

*p  <  .01. 
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Table  3 

Descriptive  Statistics  for  PCA  and  AMC  on  Each  Subsample 


Subsample  1  Subsample  2  Subsample  3 

("  =  286)  (n  =  272)  (n  =  242) 


Measure 

M 

SD 

a 

M 

SD 

a 

M 

SD 

a 

PCA 

3.80 

1.01 

.84 

3.93 

.99 

.84 

3.73 

1.02 

.84 

AMC 

4.47 

.82 

.78 

4.49 

.82 

.79 

4.59 

.86 

.80 

Note.  PCA  =  perceived  control  of  attributions;  AMC  =  awareness  of  the 
motivational  consequences  of  attributions. 

Goal  2:  Demonstrate  That  PCA  and  AMC  Relate  to 
Important  Motivational  Constructs  and  That  They 
Are  Differentially  Related  to  Constructs 

Convergent  measures.  It  was  hypothesized  that  the  PCA 
subscale  would  significantly  and  positively  relate  to  mastery  (Pear- 
lin  &  Schooler,  1978)  and  internal  attributions  (Peterson  et  al., 
1982).  Mastery  refers  to  individuals’  beliefs  about  their  ability  to 
influence  and  control  their  general  life  experiences.  Internal  attri¬ 
butions  refer  to  one’s  tendency  to  attribute  causes  to  internal 
factors.  These  predictions  were  made  because  PCA  represents  an 
internal  locus  and  a  general  perception  of  control.  We  expected 
AMC  to  significantly  and  positively  relate  to  connectedness  (Hus- 
man  &  Shell,  2008)  and  causal  importance  (Tobin  &  Weary, 
2008).  Connectedness  refers  to  the  awareness  of  how  a  present 
action  relates  to  a  future  outcome.  In  this  sense,  we  anticipated  that 
students  who  are  “connected”  would  be  more  inclined  to  see  how 
their  present  action  (attribution)  would  impact  future  outcomes 
(psychological  and  behavioral  consequences  of  attributions).  Sim¬ 
ilarly,  we  expected  that  students  who  understand  this  concept 
would  find  value  in  their  causal  explanations,  which  is  represented 
as  causal  importance. 

Discriminant  measures.  It  was  hypothesized  that  PCA  would 
not  relate  to  connectedness  and  causal  importance,  whereas  AMC 
would  not  relate  to  mastery  and  internal  attributions.  To  differen¬ 
tiate  PCAP  from  maladaptive  orientations,  we  did  not  expect  PCA 
or  AMC  to  relate  to  interpersonal  orientation  (Deci  &  Ryan,  1985). 
Interpersonal  orientation  reflects  a  students’  belief  that  desired 
outcomes  are  beyond  control  and  that  achievement  is  determined 
by  luck  or  fate.  Further,  to  differentiate  PCAP  from  personality 
traits  we  did  not  expect  PCA  or  AMC  to  relate  to  extroversion  or 
agreeableness.  We  did  not  expect  PCA  or  AMC  to  relate  to  any  of 
the  Big  Five  personality  traits  (Saucier,  1994);  however,  in  an 
effort  to  keep  the  student  survey  brief,  we  only  measured  these  two 
after  determining  that  they  best  represented  the  range  of  person¬ 
ality  traits  included  in  the  Big  Five. 

Instruments. 

PCAPS.  Students  completed  the  PCAPS  and  demographic 
questions  in  addition  to  the  following  measures. 

Internal  attributions.  The  ASQ  (Peterson  et  al.,  1982)  was 
used  to  assess  an  internal  attribution  style.  Participants  were  pre¬ 
sented  with  12  hypothetical  events  (six  positive  and  six  negative). 
Following  each  event,  participants  identified  a  cause  for  the  event 
and  rated  this  cause  on  three  different  dimensions:  intemality  (1  = 
because  of  me,  7  =  because  of  other  people  or  circumstances), 
stability  (1  =  will  always  be  present,  7  =  will  never  be  present), 
and  globality  (1  =  influences  all  situations  in  my  life,  7  = 


influences  only  this  particular  situation).  Each  cause  was  rated  on 
a  7-point  Likert  scale.  Only  the  intemality  scale  was  used  for  this 
portion  of  the  study  (Cronbach’s  a  =  .76). 

Interpersonal  orientation.  The  General  Causality  Orienta¬ 
tions  Scale  (GCOS)  was  used  to  assess  an  interpersonal  orientation 
(Deci  &  Ryan,  1985).  It  consists  of  12  vignettes  about  problems  or 
situations  that  occur  in  life.  For  example,  “A  close  (same-sex) 
friend  of  yours  has  been  moody  lately  and  a  couple  of  times  has 
become  very  angry  with  you  over  ‘nothing.’  You  might.  .  . 
Following  each  vignette  is  an  interpersonal-related  item  (i.e.,  “Ig¬ 
nore  it  because  there’s  not  much  you  can  do  about  it  anyway”). 
Students  rate  the  likelihood  that  this  would  be  their  response  to  the 
situation  (1  =  very  unlikely,  7  =  very  likely)  (Cronbach’s  a  =  .80). 

Mastery.  The  six-item  Mastery  Scale  (Pearlin  &  Schooler, 
1978)  was  used.  Items  were  as  follows:  “I  have  little  control  over 
the  things  that  happen  to  me”  (reversed),  “There  is  really  no  way 
I  can  solve  some  of  the  problems  I  have”  (reversed),  “There  is  little 
I  can  do  to  change  many  of  the  important  things  in  my  life” 
(reversed),  “I  often  feel  helpless  in  dealing  with  the  problems  of 
my  life”  (reversed),  “Sometimes  I  feel  that  I’m  being  pushed 
around  in  life”  (reversed),  “What  happens  to  me  in  the  future 
mostly  depends  on  me”,  and  “I  can  do  just  about  anything  I  really 
set  my  mind  to  do.”  Each  item  was  rated  on  a  4-point  Likert  scale 
(1  =  strongly  disagree,  4  =  strongly  agree;  Cronbach’s  a  —  .67). 

Connectedness.  To  measure  connectedness,  a  subscale  of  the 
Future  Time  Perspective  Scale  was  used  (Husman  &  Shell,  2008). 
Students  rated  six-items  (e.g.,  “One  should  be  taking  steps  today  to 
help  realize  future  goals”)  on  a  6-point  Likert  scale  (1  =  strongly 
disagree,  6  =  strongly  agree;  Cronbach’s  a  =  .78). 

Causal  importance.  A  six-item  Causal  Importance  Scale  (To¬ 
bin  &  Weary,  2008)  was  used  to  measure  participants’  perceived 
value  in  finding  a  cause  for  an  event  (e.g.,  “It  is  important  to  know 
the  causes  for  a  person’s  behavior”).  Each  item  was  rated  on  a 
6-point  Likert  scale  (1  =  strongly  disagree,  6  =  strongly  agree; 
Cronbach’s  a  =  .84). 

Personality  traits.  To  assess  participants’  extroversion  and 
agreeableness  traits,  we  used  the  Goldberg’s  Mini-Markers  (Sauc¬ 
ier,  1994).  Each  trait  corresponded  with  eight  adjectives.  Partici¬ 
pants  rated  how  accurately  each  adjective  described  them  on  a 
9-point  scale  (1  =  extremely  inaccurate,  9  =  extremely  accurate; 
Cronbach’s  a:  extroversion  =  .85,  agreeableness  =  .83). 

Social  desirability.  The  Marlowe— Crowne  Social  Desirability 
Scale— Short  Form  (Reynolds,  1982)  was  used  to  gauge  partici¬ 
pants’  tendency  to  endorse  unlikely  statements.  Participants  rated 
these  13-items  as  either  true  or  false.  This  measure  is  commonly 
used  in  scale  development  to  help  identify  faulty  items  or  scales 
(Cronbach’s  a  =  .68). 

Results  and  discussion — Goal  2 

Correlations  between  social  desirability  and  the  PCAP  beliefs 
were  assessed  to  account  for  students’  tendency  to  respond  in  a 
favorable  manner.  AMC  was  related  to  social  desirability 
(r  =  -A6,p<  .001);  thus,  each  of  these  regression  analyses  were 
conducted  twice,  once  controlling  for  social  desirability  and  once 
without.  The  outcome  of  these  approaches  did  not  differ;  thus,  the 
results  without  the  social  desirability  control  are  presented. 

Hierarchical  regression  analyses  were  conducted  using  mean- 
centered  PCA  and  AMC  scores.  In  each  model,  either  PCA  or 
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AMC  was  controlled  for  to  determine  the  unique  variance  they 
accounted  for  in  the  dependent  variable.  The  magnitude  ((3)  and 
statistical  significance  of  the  relationship  was  examined,  along 
with  the  amount  of  variance  in  the  dependent  variable  explained  by 
the  independent  variable  (A R2). 

Table  4  displays  the  results  of  the  analyses.  All  of  the  conver¬ 
gent  measures  were  significantly  related  to  PCA  and  AMC  (see 
Table  4).  PCA  related  to  mastery  (M  =  2.98,  SD  =  .81)  and 
internal  attributions  (M  =  5.15,  SD  =  .65)  above  and  beyond 
AMC.  Comparably,  AMC  related  to  connectedness  (M  =  4.30, 
SD  =  .65)  and  causal  importance  (M  =  4.15,  SD  =  .98)  above  and 
beyond  PCA. 

With  regard  to  the  discriminant  measures,  the  results  supported 
all  but  one  of  the  hypothesized  outcomes.  PCA  did  not  relate  to 
connectedness;  however,  it  did  relate  to  causal  importance.  AMC 
did  not  relate  to  mastery  or  internal  attributions  as  expected. 
Likewise,  both  PCA  and  AMC  were  unrelated  to  interpersonal 
orientation  ( M  =  3.54,  SD  =  1.05),  extroversion  ( M  =  5.85,  SD  = 
1.49),  and  agreeableness  (M  =  7.19,  SD  —  1.25). 

The  results  indicate  that  PCA  and  AMC  were  related  to  similar 
constructs  and  unrelated  to  dissimilar  constructs,  generally,  as 
expected.  This  provides  considerable  evidence  for  the  validity  of 
the  constructs  and  suggests  that  measuring  these  beliefs  could 
provide  important  information  for  students’  motivational  and  ac¬ 
ademic  outcomes.  These  results  also  help  to  empirically  situate 
PCAP  among  relevant  psychological  constructs. 

Goal  3:  (a)  Determine  Whether  the  Subjective 
Controllability  of  an  Event  Affected  Levels  of  PCA  or 
AMC  and,  if  so,  (b)  Whether  PCAP  Would  Relate  to 
Positive  Outcomes  When  Controlling  for  This  Context 

Instruments. 

PCAPS.  Students  completed  the  PCAPS  and  demographic 
questions  in  addition  to  the  following  measures. 

Adaptive  attribution  style.  The  ASQ  was  used  to  assess  an 
adaptive  attribution  style.  A  composite  score,  across  dimensions 
(intemality,  stability,  and  globality),  was  computed  for  only  the 
positive  events.  Higher  scores  indicated  a  more  adaptive  attribu¬ 
tion  style  (Cronbach’s  a  =  .74). 

Table  4 


Indicators  of  Convergent  and  Discriminant  Validity 


Measure 

PCA 

AMC 

P 

A  R2 

P 

A  R2 

PCA  similar/AMC  dissimilar 

Mastery 

.14** 

.04** 

-.03 

.00 

ASQ  (internal  attributions) 

.13* 

.02* 

-.06 

.00 

AMC  similar/PCA  dissimilar 

Connectedness 

-.07 

.010 

.20** 

.05** 

Causal  importance 

.23** 

.05** 

.44** 

.11** 

PCA  and  AMC  dissimilar 

Interpersonal  orientation 

.02 

<.01 

.02 

<.01 

Extroversion 

.03 

<.01 

.00 

<.01 

Agreeableness 

-.15 

.01 

.15 

.01 

Note.  PCA  =  perceived  control  of  attributions;  AMC  =  awareness  of  the 
motivational  consequences  of  attributions;  ASQ  =  Attribution  Style  Ques¬ 
tionnaire. 

*  p  <  .05.  *><.001. 


Autonomy.  The  GCOS  was  used  to  assess  an  autonomy  ori¬ 
entation  (Deci  &  Ryan,  1985).  It  consists  of  12  vignettes  about 
problems  or  situations  that  occur  in  life.  For  example,  “A  close 
(same-sex)  friend  of  yours  has  been  moody  lately,  and  a  couple  of 
times  has  become  very  angry  with  you  over  ‘nothing.’  You 
might.  .  .  .”  Following  each  vignette  is  an  autonomy-related  item 
(i.e.,  “Share  your  observations  with  him/her  and  try  to  find  out 
what  is  going  on  for  him/her”).  Students  rate  the  likelihood  that 
this  would  be  their  response  to  the  situation  (1  =  very  unlikely,  1  = 
very  likely,  Cronbach’s  a  =  .77). 

Cognitive  reappraisal.  To  assess  the  tendency  to  use  cognitive 
strategies  following  stressful  events,  the  four-item  positive  rein¬ 
terpretation  scale  from  the  COPE  was  used  (Carver  et  al.,  1989). 
Participants  were  instructed  to  report  how  frequently  they  use  the 
strategies  in  stressful  events  (items  were  as  follows:  “I  look  for 
something  good  in  what  is  happening,”  “I  try  to  see  it  in  a  different 
light,  to  make  it  seem  more  positive,”  “I  learn  something  from  the 
experience,”  and  “I  try  to  grow  as  a  person  as  a  result  of  the 
experience”).  Each  item  was  rated  on  a  4-point  Likert  scale  (1  = 
I  usually  don’t  do  this  at  all,  4  =  I  usually  do  this  a  lot',  Cronbach’s 
a  =  .81). 

Subjective  well-being.  Consistent  with  the  approach  used  by 
Vansteenkiste  and  colleagues  (e.g.,  2006),  three  measures  were 
used  to  assess  well-being.  The  Positive  Affect  Negative  Affect 
Schedule  (Watson  et  al.,  1988)  includes  10  positive  (e.g.,  proud) 
and  10  negative  (e.g.,  irritable)  mood  items.  Participants  reported 
how  frequently  they  had  experienced  the  mood  in  the  last  month. 
Each  item  was  rated  on  a  5-point  Likert  scale  (1  =  very  slightly  or 
not  at  all,  5  =  extremely,  Cronbach’s  a:  positive  affect  =  .86, 
negative  affect  =  .85).  The  five-item  Satisfaction  with  Life  Scale 
(SWLS;  Diener,  Emmons,  Larsen,  &  Griffin,  1985)  asked  partic¬ 
ipants  to  rate  their  life  satisfaction  (e.g.,  “The  conditions  of  my  life 
are  excellent”).  Each  item  was  rated  on  a  7-point  Likert  scale  (1  = 
strongly  disagree,  7  =  strongly  agree).  To  obtain  an  overall 
well-being  score,  a  composite  was  computed  by  standardizing  and 
summing  positive  affect  and  SWLS  and  then  subtracting  negative 
affect  (Cronbach’s  a  =  .87). 

Event-specific  perceived  control  of  the  attribution  process. 
The  9-item  ES-PCAPS  that  was  developed  in  a  pilot  study  by 
Fishman  (2014a)  was  used  to  measure  PCA  (five  items)  and  AMC 
(four  items)  at  the  event-level.  Students  were  given  the  following 
instructions: 

Sometimes  we  experience  things  in  our  lives  that  are  unexpected. 
Think  about  the  past  few  weeks  of  your  life.  Please  describe  a 
situation  in  which  something  unexpected  happened.  Preferably  a 
situation  that  made  you  think  or  that  you’re  still  thinking  about. 

Controllability  of  the  event.  To  assess  the  controllability  of 
the  event  described  by  the  participants,  a  single-item  measure  was 
used.  Following  their  described  event,  participants  were  asked, 
How  much  control  did  you  have  over  the  event  you  described 
above?”  They  responded  using  the  following  options:  1  (no  con¬ 
trol),  2  (a  little  control),  3  ( some  control),  4  (a  lot  of  control),  and 
5  (total  control). 

After  addressing  the  controllability  item,  participants  were  given 
the  following  instructions:  “Now,  think  about  what  CAUSED  that 
event  or  situation.  Why  did  it  happen?  The  following  statements 
have  to  do  with  the  situation/event  you  described  above.  Please 
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choose  the  response  that  best  describes  how  you  feel  about  each 
statement.” 

The  ES-PCAPS  items  followed  these  instructions.  ES-PCA 
items  were  as  follows:  I  m  the  one  who  determines  why  it 
happened,  Decisions  about  why  it  happened  are  under  my  con¬ 
trol,  I  have  a  great  deal  of  control  over  determining  why  it 
happened,”  “It’s  up  to  me  to  decide  why  it  happened,”  and  “I’m 
the  one  who  determines  what  caused  the  event.”  ES-AMC  items 
were  as  follows:  “My  decisions  about  why  the  event  happened 
affect  how  I  react  to  it,”  “The  way  I  think  about  the  event  affects 
how  I  react  in  similar  events  that  happen  in  the  future,”  “Changing 
my  mind  about  what  caused  the  situation  can  change  how  I  react 
to  it,”  and  “I  believe  the  way  I  explain  the  event  can  impact  how 
I  feel.”  For  each  item,  students  rated  their  level  of  agreement  on  a 
6-point  Likert  scale. 

To  determine  whether  ES-PCAPS  and  PCAPS  measure  the 
same  construct  at  different  levels,  we  conducted  a  CFA  with  the 
ES-PCAPS  and  PCAPS  items;  this  allowed  us  to  examine  their 
convergence.  A  four-factor  solution  (ES-PCA,  ES-AMC,  PCA, 
and  AMC),  x2(162,  N  =  800)  =  335.13,  CFI  =  .97,  RMSEA  = 
.04)  best  fit  the  model.  The  PCA  factors  (PCA  and  ES-PCA) 
correlated  strongly  (r  =  .41),  as  did  the  AMC  factors  (AMC  and 
ES-AMC:  r  =  .52),  whereas  the  cross-correlations  were  only 
moderate  in  strength  (PCA  and  ES-AMC:  r  =  .23;  AMC  and 
ES-PCA:  r  =  21).  This  convergence  suggests  that  the  ES-PCAPS 
measures  beliefs  that  are  closely  related  to  the  general  expressions 
of  those  constructs.  In  terms  of  reliability,  the  ES-PCA  and  ES- 
AMC  subscales  were  acceptable  (Cronbach’s  a:  ES-PCA  =  .92; 
ES-AMC  =  .76). 

Results 

Contextual-dependency  analysis.  A  one-way  ANOVA  was 
conducted  to  assess  the  differences  in  levels  of  ES-PCA  (M  = 
2.85,  SD  =  1.58)  and  ES-AMC  (M  =  4.32,  SD  =  1.17)  between 
controllable  and  uncontrollable  events.  Using  the  students’  re¬ 
sponses  to  the  single-item  regarding  the  subjective  controllability 
of  their  described  event  (M  =  2.28,  SD  —  1.36),  the  events  were 
separated  into  two  categories:  controllable  events  (one  standard 
deviation  above  the  mean)  and  uncontrollable  events  (one  standard 
deviation  below  the  mean).  According  to  these  parameters,  stu¬ 
dents  described  more  uncontrollable  events  ( n  =  329)  than  con¬ 
trollable  events  ( n  —  176).  Those  with  missing  ES-PCAPS  re¬ 
sponses  were  deleted  listwise  from  this  analysis.  Of  the  students’ 
events,  58%  were  directly  related  to  academics,  and  26%  were 
related  to  social  events  in  a  school  setting. 

The  results  were  as  expected:  There  was  a  statistically  signifi¬ 
cant  difference  in  levels  of  PCA  between  controllable  (n  =  171, 
M  =  4.67,  SD  =  1.07)  and  uncontrollable  ( n  =  320,  M  =  1.77,  SD, 
.99)  events,  F(l,  489)  =  892.85,  p  <  .001.  Cohen’s  effect  size 
{d  =  2.81)  suggested  a  very  large  practical  significance.  There  was 
also  a  statistically  significant  difference  in  levels  of  AMC  between 
controllable  (n  =  172,  M  =  4.95,  SD  =  .84)  and  uncontrollable 
events  (n  =  325,  M  =  3.89,  SD  =  1.23)  events,  F(  1,  495)  =  94.76, 
p  <  .001.  This  effect  was  smaller  than  that  of  PCA  but  still  had 
large  practical  significance  ( d  =  .98). 

Motivational  implications  of  PCAP  when  accounting  for 
context.  Given  that  levels  of  students’  PCA  and  AMC  depended 
on  the  subjective  controllability  of  an  event,  we  examined  whether 


the  motivational  benefits  of  PCAP  would  still  be  present  if  we 
accounted  for  this  subjective  controllability.  We  conducted  a  series 
of  regression  models  with  mean-centered  ES-PCA  and  ES-AMC 
variables.  Again,  (3  and  A R2  were  used  to  assess  associations. 
Students’  subjective  controllability  of  their  described  event  was 
included  as  a  control  variable  in  these  regression  models,  as  were 
age,  gender,  and  ethnicity. 

The  results  were  somewhat  unexpected.  ES-PCA  was  negatively 
related  to  autonomy,  whereas  ES-AMC  was  associated  with  au¬ 
tonomy,  cognitive  reappraisal,  and  an  adaptive  attribution  style 
(see  Table  5).  The  ES-PCAxES-AMC  interaction  was  associated 
with  the  same  outcomes  as  ES-AMC.  The  only  outcome  that  was 
not  related  to  the  ES-PCAP  variables  was  well-being. 

Discussion — Goal  3.  The  third  research  goal  was  to  determine 
whether  (a)  the  subjective  controllability  of  an  event  affected 
levels  of  PCA  or  AMC  and  (b)  PCAP  would  relate  to  positive 
outcomes  when  controlling  for  this  context.  As  expected,  the 
results  indicated  that  levels  of  PCA  and  AMC  were  significantly 
higher  for  subjectively  controllable  events.  The  results  suggest  that 
participants’  perceived  capability  to  explain  what  caused  the  event 
was  tied  to  whether  they  felt  capable  of  controlling  the  event  in  the 
first  place.  Similarly,  their  awareness  of  the  motivational  conse¬ 
quences  of  such  explanations  was  also  linked  to  the  subjective 
controllability  of  the  event.  Notably,  the  difference  between  con¬ 
trollable  and  uncontrollable  events  was  more  than  twice  as  large 
for  PCA  as  it  was  for  AMC.  Nevertheless,  this  indicates  that  when 
an  event  is  out  of  students’  control,  they  are  unlikely  to  feel 
capable  of  determining  the  cause  of  the  event  and  unaware  of  how 
their  causal  thinking  could  affect  them. 

When  controlling  for  this  context,  the  results  revealed  that  the 
ES-PCAXES-AMC  interaction  significantly  and  positively  pre¬ 
dicted  autonomy,  cognitive  reappraisal,  and  an  adaptive  attribution 
style.  At  the  event-level,  AMC  appeared  to  explain  most  of  the 
variance  in  these  outcomes.  In  sum,  although  the  context  of 
the  event  was  associated  with  levels  of  PCAP,  it  did  not  affect  the 
motivational  properties  of  the  beliefs,  with  the  exception  of  well¬ 
being.  This  research  goal  provided  information  about  the  PCAP 
beliefs  at  the  event-level.  The  results  suggest  that  the  context  of 
specific  events  can  influence  students  perceived  control  of  the 
attribution  process,  hence,  measuring  students’  general  PCAP  pro¬ 
vides  a  more  realistic  representation  of  how  their  PCAP  impacts 
outcomes  on  a  daily  basis  and  over  time. 


Table  5 


Results  of  Hierarchical  Regression  Analysis,  ES-PCAP 


Measure 

ES-PCA 

ES-AMC 

ES- 

PCAXES- 

AMC 

P 

A  R2 

P 

A  R2 

P 

A R2 

Autonomy 

-.13* 

.01* 

.25** 

.05** 

.16*’ 

'  .02** 

Well-being 

.04 

.00 

.00 

<.01 

.05 

.00 

COPE  (cognitive  reappraisal) 

-.07 

.00 

.14** 

.or* 

13*, 

'  .01** 

ASQ  (adaptive  attribution  style) 

-.10 

.00 

.18** 

.03** 

.21” 

•  .04** 

Note.  ES-PCA  =  event-specific  perceived  control  of  attributions;  ES- 
AMC  =  event-specific  awareness  of  the  motivational  consequences  of 
attributions;  ASQ  =  Attribution  Style  Questionnaire. 

*  p  <  .05.  *><.01. 
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Goal  4:  Examine  the  Motivational 
Implications  of  PCAP 

Instruments.  Autonomy,  cognitive  reappraisal,  attribution 
style,  subjective  well-being,  and  (general)  PCAP  were  assessed 
using  the  measures  described  above. 

Results 

Motivational  implications  of  PCAP.  A  series  of  regression 
models  were  conducted  to  examine  the  unique  contribution  ((3  and 
A /s’2)  of  each  independent  variable  on  the  dependent  variables.  In 
each  regression  model,  either  PCA  (M  =  3.82,  SD  =  1.01)  or 
AMC  (M  =  4.51,  SD  =  .83)  was  controlled  for  to  determine  their 
unique  contribution.  Both  PCA  and  AMC  were  controlled  for  in 
models  that  assessed  the  variance  explained  by  the  PCA  X  AMC 
interaction.  Importantly,  each  of  these  regression  models  included 
gender,  age,  and  ethnicity  as  control  variables. 

With  the  exception  of  two  outcomes,  the  anticipated  results  were 
obtained  (see  Table  6).  AMC  predicted  autonomy  whereas  PCA 
did  not.  Conversely,  PCA  predicted  well-being  whereas  AMC  did 
not.  This  suggests  that  PCA  and  AMC  are  separate  yet  integral 
parts  of  PCAP  that  influence  different  outcomes.  Thus,  when 
examining  the  impact  of  PCAP,  it  is  important  to  consider  both 
PCA  and  AMC  as  they  have  distinct  motivational  properties. 
Additionally,  PCA  X  AMC  accounted  for  unique  variance  in  cog¬ 
nitive  reappraisal,  as  expected. 

Validation  of  the  PCAP  model.  To  verify  the  sequence  of  the 
measured  variables  in  the  proposed  PCAP  model,  a  path  model 
was  assessed  using  structural  equation  modeling  in  Mplus  (see 
Figure  2).  Based  on  the  results  of  the  previous  regression  analysis, 
the  expected  relationships  were  refined.  PCA  was  expected  to 
predict  well-being  and  cognitive  reappraisal.  AMC  was  expected 
to  predict  autonomy  and  cognitive  reappraisal.  The  conceptual 
model  also  posits  that  the  PCAP  beliefs  are  not  likely  to  influence 
attribution  style  unless  cognitive  actions  are  cued.  Thus,  only 
cognitive  reappraisal  was  expected  to  predict  attribution  style. 
Further,  it  was  expected  that  cognitive  reappraisal  would  mediate 
the  relationship  between  PCA  and  AMC  to  autonomy  and  well¬ 
being.  It  was  also  expected  that  an  adaptive  attribution  style  would 
mediate  the  relationship  between  cognitive  reappraisal  to  auton¬ 
omy  and  well-being. 


Table  6 


Results  of  Hierarchical  Regression  Analysis,  PCAP 


PCA 

AMC 

PCA  X  AMC 

Measure 

3 

A R2 

P 

A  R2 

P 

A  R2 

Autonomy 

.02 

<.01 

.30** 

.10** 

.04 

.00 

Well-being 

.23** 

.01** 

.13 

.00 

.06 

.00 

COPE  (cognitive 
reappraisal) 

.05* 

.01* 

.10** 

.02** 

.07** 

.01** 

ASQ  (adaptive 
attribution  style) 

.07* 

.01* 

.05* 

.00* 

.04 

.00 

Note.  PCA  =  perceived  control  of  attributions;  AMC  =  awareness  of  the 
motivational  consequences  of  attributions;  ASQ  =  Attribution  Style  Ques¬ 
tionnaire. 

>  <  .05.  *><.01. 


Figure  2.  Empirical  PCAP  path  model.  PCA  =  perceived  control  of 
attributions;  AMC  =  awareness  of  motivational  consequences  of  attribu¬ 
tions.  Only  statistically  significant  relationships  with  the  demographic 
variables  are  shown.  *  p  <  .05.  **  p  <  .01. 


In  this  model,  age,  gender,  and  ethnicity  were  included  as 
control  variables  (see  Figure  2).  The  model  had  excellent  fit,  y2(6, 
N  =  800)  =  9.09,  CFI  =  .994,  TLI  =  .976,  RMSEA  =  .025, 
SRMR  =  .016.  All  but  two  paths  were  statistically  significant: 
adaptive  attributions  on  AMC  and  PCA.  Although  these  paths 
were  nonsignificant,  a  chi-square  difference  test  revealed  that  this 
model  was  not  statistically  different  from  a  model  that  held  the 
paths  equal  to  0,  A\2(2)  =  5.82,  p  =  .054,  and  the  fit  indices  of  the 
more  constrained  model,  x2(8,  N  =  800)  =  15.70,  CFI  =  .986, 
TLI  =  .955,  RMSEA  =  .035,  SRMR  =  .022,  indicated  the  model 
including  the  paths  is  preferred.  These  nonsignificant  paths  were 
expected,  as  the  PCAP  beliefs  are  not  likely  to  influence  attribution 
style  unless  students  have  a  tendency  to  cognitively  reappraise. 
With  regard  to  the  demographic  variables,  age  was  significantly 
associated  with  AMC,  cognitive  reappraisal,  adaptive  attribution 
style,  and  autonomy.  Age  was  the  most  impactful  demographic 
variable.  Ethnicity  and  gender  were  significantly  associated  with 
PCA  and  well-being. 

Indirect  effects  and  mediation.  The  indirect  effects  of  the 
expected  mediated  paths  were  examined  using  the  Sobel  (1982) 
test.  The  results  showed  that  each  of  the  anticipated  mediated  paths 
were  statistically  significant.  The  indirect  effect  of  PCA  to  auton¬ 
omy  as  mediated  by  cognitive  reappraisal  was  significant  (z  = 
2.01,  p  =  .037),  as  was  the  relationship  between  AMC  and 
autonomy  as  mediated  by  cognitive  reappraisal  (z  =  3.24,  p  — 
.001).  Similarly,  cognitive  reappraisal  mediated  the  relationship 
between  PCA  and  well-being  (z  =  2.10,  p  =  .036)  and  the 
relationship  between  AMC  and  well-lpeing  (z  =  3.08,  p  =  .002). 

Also  as  expected,  an  adaptive  attribution  style  played  a  medi¬ 
ating  role  in  the  second  tier  of  the  model.  An  adaptive  attribution 
style  mediated  the  relationship  between  cognitive  reappraisal  and 
autonomy  (z  =  3.45,  p  =  .001)  and  cognitive  reappraisal  and 
well-being  (z  =  2.45,  p  =  .015). 

Discussion — Goal  4.  The  fourth  research  goal  was  to  examine 
the  motivational  implications  of  PCAP.  The  results  demonstrated 
that  PCA  and  AMC  significantly  predicted  outcomes  as  expected. 
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Where  PCA  failed  to  predict  autonomy,  AMC  did;  and  where 
AMC  failed  to  predict  well-being,  PCA  did.  This  provides  further 
evidence  for  their  uniqueness  as  subconstructs.  Additionally,  the 
PCA  X  AMC  interaction  significantly  predicted  cognitive  reap¬ 
praisal  beyond  both  constructs  alone.  This  indicates  that  students 
who  adopt  both  beliefs  are  more  likely  to  engage  in  cognitive 
reappraisal  following  stressful  situations  than  students  who  adopt 
one  or  neither  of  the  beliefs. 

The  results  also  demonstrated  that  the  PCAP  model  was  a  strong 
fit  of  the  data.  Using  the  full  sample  and  controlling  for  demo¬ 
graphic  variables,  the  PCAP  model  demonstrated  excellent  fit.  As 
expected,  neither  PCA  nor  AMC  significantly  led  to  an  adaptive 
attribution  style;  however,  both  were  significantly  associated  with 
cognitive  reappraisal,  which  led  to  adaptive  attributions.  Thus,  in 
line  with  the  conceptual  model,  the  function  of  PCAP  to  influence 
attribution  style  is  largely  due  to  the  cognitive  strategies  facilitated 
by  the  beliefs.  In  other  words,  attribution  style  is  not  likely  influ¬ 
enced  unless  students  have  a  tendency  to  cognitively  intervene  in 
the  attribution  process. 

The  results  also  revealed  that  cognitive  reappraisal  significantly 
mediated  the  relationship  between  the  PCAP  variables  and  auton¬ 
omy;  and  the  relationship  between  the  PCAP  variables  and  well¬ 
being.  Similarly,  in  the  second  tier  of  the  model,  an  adaptive 
attribution  style  played  a  significant  role  in  mediating  the  relation¬ 
ship  between  cognitive  reappraisal  and  the  dependent  variables. 
The  results  also  showed  that  students  are  more  likely  to  adopt  the 
AMC  perspective  as  they  age.  This  makes  sense,  as  experience  is 
a  likely  source  of  meta-cognitive  knowledge.  Overall,  the  results 
support  the  PCAP  model  whereby  students  who  adopt  the  PCAP 
beliefs  are  more  likely  to  engage  in  cognitive  strategies  that 
promote  adaptive  attributions;  which  lead  to  autonomy  and  sub¬ 
jective  well-being. 

General  Discussion 

Students’  overall  experience  depends  largely  on  how  they  deal 
with  stressful  events.  The  attribution  process  is  perhaps  the  most 
fundamental  and  automatic  way  of  dealing  with  such  events.  Thus, 
it  was  theorized  that  students’  perceived  control  of  the  attribution 
process  plays  an  important  role  in  facilitating  and  promoting 
adaptive  outcomes.  The  present  study  took  an  empirical  approach 
to  examine  these  claims  and  to  offer  evidence  for  the  consideration 
of  attribution-related  beliefs  as  a  complement  to  attribution  theory . 

We  found  that  perceived  control  of  attributions  and  awareness 
of  the  consequences  of  attributions  were,  in  fact,  separate  con¬ 
structs;  and  that  they  differentially  related  to  outcomes.  We  also 
found  that  levels  of  PCA  and  AMC  varied  significantly  by  con¬ 
trollable  and  uncontrollable  events;  although,  when  accounting  for 
this  factor,  the  PCAP  beliefs  still  positively  related  to  students 
autonomy,  tendency  for  cognitive  reappraisal  and  adaptive  attri¬ 
bution  style.  Finally,  we  found  support  for  the  PCAP  model,  in 
which  perceived  control  of  the  attribution  process  facilitated  a 
tendency  for  cognitive  reappraisal  which  promoted  an  adaptive 
attribution  style,  which  related  to  autonomy  and  subjective  well¬ 
being. 

Theoretical  Implications 

In  each  phase  of  this  study,  students  who  adopted  both  the  PCA 
and  AMC  perspective  experienced  more  favorable  outcomes  than 


students  who  did  not.  This  demonstrates  that  there  are  individual 
differences  in  attribution-related  control  beliefs  and  that  these 
differences  have  a  measureable  impact  on  students’  motivation. 
We  believe  these  findings  have  meaningful  theoretical  implica¬ 
tions. 

According  to  the  literature,  students  have  an  innate  predisposi¬ 
tion  to  perceive  control  over  their  environment,  and  when  that 
control  is  threatened,  they  engage  in  internal  strategies  to  regain  a 
sense  of  control.  The  present  study  contributes  to  the  literature  that 
features  this  synergistic  strive  for  control.  Although  the  conceptu¬ 
alization  of  PCAP  stems  from  existing  control  theories,  it  differs 
from  past  conceptualizations  in  that  it  represents  a  perceived 
control  of  an  internal  phenomenon.  This  meta-cognitive  compo¬ 
nent  adds  an  element  to  the  study  of  personal  control  beliefs  that 
has  not  often  been  explored. 

Conceptually,  perceived  control  of  the  attribution  process  is 
linked  to  autonomy  because  when  students  feel  it  is  “up  to  them” 
to  determine  why  things  happen,  they  are  more  inclined  to  exercise 
control  of  the  causal  thought  process  rather  than  being  at  the  mercy 
of  their  automatic  patterns.  Students  who  perceive  control  of  their 
attribution  process  feel  that  they  are  in  the  “driver’s  seat”  and  can 
take  more  deliberate  action,  at  least  cognitively,  which  engenders 
a  sense  of  autonomy.  This  is  especially  important  for  college 
students  who  are  navigating  novel  academic  contexts  that  require 
high  levels  of  independence  and  self-reliance.  A  recent  study 
illustrated  the  fundamentality  of  autonomy,  finding  that  a  sense  of 
autonomy  satisfies  the  desire  for  control  significantly  more  than 
having  influence  over  others  (Lammers,  Stoker,  Rink,  &  Galinsky, 
2016). 

The  present  findings  also  support  existing  research  that  suggests 
personal  beliefs  about  one’s  capability  to  determine  the  cause  of  an 
event  impacts  cognitive  processes  and  is  linked  to  their  well-being 
(Tobin  &  Raymundo,  2010).  Given  that  Weiner’s  (2010) 
attribution-based  model  acknowledges  that  personal  factors  impact 
the  causal  thought  process,  considering  students’  personal  beliefs 
about  their  own  causal  thought  is  a  natural  extension  of  attribution 
theory  that  may  broaden  understanding  of  how  this  fundamental 
process  impacts  student  outcomes. 

The  Contextual-Dependency  of  PCAP 

Evaluating  perceived  control  of  the  attribution  process  at  the 
event-level  demonstrated  that  students’  perceived  control  of  attri¬ 
butions  and  awareness  of  the  motivational  consequences  of  attri¬ 
butions  depend  on  whether  the  event  is  considered  controllable. 
Students  were  more  likely  to  perceive  control  over  their  causal 
reasoning  if  the  event  itself  was  under  their  control  (e.g.,  failed 
test);  whereas  when  an  event  was  uncontrollable  (e.g.,  death  of  a 
loved  one),  participants  reported  significantly  lower  perceptions  of 
control  over  explaining  why  the  event  happened.  This  was  also 
true  regarding  students’  awareness  of  how  their  attributions  would 
affect  them  psychologically  and  behaviorally. 

These  results  support  past  findings  that  suggest  attributions  are 
intrinsically  tied  to  the  events  for  which  they  are  made  (Bemtsen 
&  Rubin,  2006).  Students  may  be  inclined  to  associate  all  aspects 
of  an  event  and  see  them  as  a  single  entity,  which  could  hinder 
their  ability  to  disassociate  the  event  from  the  process  of  making 
causal  attributions  for  the  event.  It  may  also  be  that  controllable 
events,  such  as  gaining  weight,  have  more  obvious  causes  (e.g., 
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stopped  exercising),  in  which  case,  determining  why  it  happened 
may  feel  well  within  students’  control. 

Because  PCA  and  AMC  varied  significantly  by  context,  assess¬ 
ing  students’  PCAP  beliefs  based  on  a  single  event  can  misrepre¬ 
sent  their  general  attribution-related  beliefs,  or  those  beliefs  on 
average.  Attributions  occur  in  all  aspects  of  life  and  are  situated  in 
unique  contexts  each  time  the  process  is  initiated.  Thus,  although 
the  controllability  of  the  stressful  event  had  significant  impact  on 
PCAP,  it  is  merely  one  aspect,  from  one  event,  from  one  time  in 
the  students’  lives.  This  supports  findings  from  the  pilot  study  by 
Fishman  (2014a)  that  suggest  a  general  approach  to  measuring 
these  meta-cognitive  beliefs  provides  a  broader  and  more  realistic 
representation  of  students’  perceived  control  of  the  attribution 
process.  Hence,  a  general,  all-encompassing,  approach  to  measure 
these  beliefs  is  preferred. 

Nevertheless,  when  evaluating  the  contextual  dependency  of 
PCAP,  it  is  important  to  consider  the  interaction  term,  as  it  appears 
to  have  a  more  prominent  effect  at  the  event-level  than  at  the 
general-level.  Importantly,  these  results  suggest  that  even  when 
students  are  faced  with  uncontrollable  events,  they  benefit  from 
perceiving  control  of  the  attribution  process.  Conceptually,  this 
indicates  that  PCAP  satiates  a  student’s  need  for  control,  whereby 
perceiving  control  of  a  cognitive  aspect  (i.e.,  attribution  process)  of 
an  event  is  adequate  for  maintaining  autonomy  and  well-being. 
This  notion  resembles  Rothbaum  et  al.’s  (1982)  original  senti¬ 
ments  regarding  secondary  control. 

Inferences  From  the  PCAP  Model 

The  results  from  the  empirical  examination  of  the  PCAP  model 
suggest  that  perceived  control  of  the  attribution  process  plays  an 
adaptive  role  in  the  causal  thought  process.  Students  who  felt  it 
was  “up  to  them”  to  determine  why  events  happen  and  were  aware 
of  the  motivational  consequences  of  those  determinations  were 
significantly  more  likely  to  report  using  cognitive  strategies  and 
more  likely  to  experience  autonomy  and  well-being.  Although  we 
do  not  infer  causal  relationships,  we  subjected  the  data  to  a 
psychometrically  strict  path  model  that  indicated  students  who 
perceive  control  of  the  attribution  process  are  more  likely  to 
engage  in  cognitive  reappraisal  strategies,  which  led  to  autonomy 
and  well-being  as  mediated  by  an  adaptive  attribution  style. 

This  supported  the  conceptual  argument  that  students  who 
adopt  the  PCAP  beliefs  are  more  likely  to  disengage  from  the 
causal  thought  process.  Students’  underlying  belief  that  it  is  “up 
to  them”  to  determine  why  events  happen,  and  their  awareness  of 
how  their  thinking  about  the  event  affects  them  may  interrupt  the 
automaticity  of  the  attribution  process,  putting  the  student  in  a 
position  to  influence  the  process.  Hence,  perceived  control  of  the 
attribution  process  is  a  personal  factor  that  enriched  students’ 
causal  thought  patterns  and  increased  the  likelihood  of  positive 
motivational  outcomes. 

Educational  and  Practical  Implications 

Importantly,  there  are  practical  implications  to  consider.  As 
demonstrated  here,  perceiving  oneself  as  having  control  over  a 
cognitive  process,  especially  when  primary  control  is  threatened,  is 
an  advantageous  perspective  that  has  the  potential  to  safeguard 
students’  well-being  and  promote  the  experience  of  autonomy. 


Given  that  students  often  encounter  stressful  events  in  the  aca¬ 
demic  and  social  contexts  of  school,  PCAP  may  shed  light  on  why 
some  persist  while  others  are  motivationally  debilitated  by  them. 
The  PCAPS  allows  educators  to  identify  students  who  would 
benefit  from  a  change  in  these  meta-cognitive  attribution-related 
beliefs. 

Ultimately,  interventions  designed  to  promote  the  perceived 
control  of  the  attribution  process  can  be  implemented  to  bolster 
students’  autonomy  and  well-being.  These  interventions  would 
likely  be  designed  using  elements  from  counseling  psychology  and 
AR  (attributional  retrainings).  In  combining  these  elements,  the 
intervention  cannot  only  educate  students  about  the  attribution 
process,  but  promote  an  awareness  and  perceived  control  of  the 
process.  Those  who  understand  these  principles  are  less  likely  to 
operate  in  their  default  setting;  less  likely  to  answer  the  “Why?” 
question  with  an  automatic  response,  and  less  likely  to  be  at  the 
mercy  of  their  learned  patterns.  Because  students  make  attributions 
in  all  aspects  of  life,  these  interventions  can  produce  important 
realizations  for  students  that  lead  to  improved  motivational  out¬ 
comes. 

In  support  of  this,  researchers  have  recently  begun  investigating 
the  impact  of  mindfulness-based  cognitive  therapy  (MBCT)  in 
educational  settings.  MBCT  is  designed  to  promote  awareness  and 
change  the  relationship  with  unwanted  thoughts  so  that  partici¬ 
pants  are  less  inclined  to  act  automatically  or  negatively  in  stress¬ 
ful  situations  (Ma  &  Teasdale,  2004).  Studies  have  shown  that  this 
type  of  intervention  can  significantly  reduce  teachers’  and  stu¬ 
dents’  stress.  Further,  interview  data  has  revealed  that  teachers  and 
students  who  participate  in  MBCT  report  greater  awareness  and 
perceived  control  of  their  causal  thought  resulting  in  more  positive 
reframing  of  stressful  situations  (Mahfouz  &  Levitan,  2016;  Mur¬ 
phy,  2016). 

Why  is  it  important  for  students  to  perceive  control  of  the 
attribution  process?  Growing  evidence  demonstrates  the  interrela¬ 
tion  of  students’  social  and  academic  motivation,  and  their  shared 
impact  on  educational  outcomes  (e.g.,  Makara,  2016).  Students 
who  perceive  control  of  the  attribution  process  are  likely  to  benefit 
in  the  academic  and  social  contexts  of  school,  as  stressful  events 
occur  in  both  environments.  Additionally,  demanding  learning 
environments  such  as  high  school  and  college  present  academic 
challenges  that  are  often  followed  by  self-reflection,  or  causal 
thought  (e.g.,  “Why  did  I  take  this  class?”  or  “Why  did  I  fail  the 
midterm?”).  Research  has  shown  that  adaptive  attributions  for 
these  types  of  events  are  related  to  positive  achievement  outcomes 
(e.g.,  Platt,  1988)  and  can  remediate  the  negative  effects  of  specific 
learning  contexts  such  as  poor  teaching  (Perry  &  Magnusson, 
1989).  As  the  present  study  demonstrated,  students’  perceived 
control  of  the  attribution  process  can  be  instrumental  in  promoting 
a  tendency  for  cognitive  reappraisal  which  facilitates  an  adaptive 
attribution  style. 

Limitations  and  Future  Directions 

The  present  study  included  self-report  data  which  has  several 
limitations,  namely,  the  absence  of  behavioral  outcome  measures. 
Examination  of  behavioral  outcomes  would  broaden  the  under¬ 
standing  of  the  impact  of  perceived  control  of  the  attribution 
process.  The  data  in  the  present  study  is  cross-sectional  which  did 
not  allow  for  examination  of  students’  PCAP  over  time.  Experi- 
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mental  and  longitudinal  studies  would  enable  investigation  into  the 
causal  properties  of  PCAP  and  long-term  implications  for  students. 
This  study  had  a  gender  skewed  student  sample  (mostly  female) 
with  some  nontraditional  students  (55%  were  ages  18  through  22). 
The  students  in  this  study  were  recruited  from  only  one  university; 
thus,  extrapolating  these  results  outside  of  this  university  requires 
qualification.  However,  the  university  from  which  this  sample  was 
recruited  houses  a  diverse  student  population.  The  students  in  this 
study  come  from  a  less  selective  university  which,  compared  to 
highly  selective  universities,  generates  a  more  diverse  student 
body  with  regard  to  socioeconomic  status  and  academic  perfor¬ 
mance.  Additionally,  the  courses  from  which  the  students  were 
recruited  were  less  selective  reflecting  this  diversity  and  perhaps 
more  representative  of  North  American  populations  than  highly 
selective  courses  or  universities.  The  perceived  control  of  the 
attribution  process  scale  used  in  this  study  produced  valid  and 
reliable  results;  however,  more  examination  is  needed  to  differen¬ 
tiate  the  PCAPS  items  from  similar  scale  items,  including  those 
used  in  this  study  (e.g.,  causal  importance,  mastery).  Future  studies 
that  focus  on  potential  item  overlap  and  further  scale  development 
are  recommended. 

Additionally,  future  studies  with  more  diverse  samples  are  en¬ 
couraged  to  examine  PCAP  across  ethnic  groups.  Existing  research 
suggests  that  these  types  of  cognitive  strategies  differ  between 
those  in  Western  and  Eastern  cultures  (Morling  &  Fiske,  1999; 
Sasaki  &  Kim,  201 1).  Past  research  asserts  that  religious  beliefs  in 
Eastern  cultures  emphasize  a  reliance  on  a  higher  power,  or  an 
external-oriented  coping  style,  which  may  discourage  PCAP  be¬ 
liefs  as  PCAP  is  an  internal-oriented  coping  source.  Thus,  future 
research  that  examines  the  role  of  religion  in  students’  PCAP 
beliefs,  or  coping  locus,  is  encouraged.  Moreover,  a  student’s 
childhood  environment  can  also  affect  his  or  her  sense  of  control. 
Mittal  and  Griskevicius  (2014)  demonstrated  that  those  who  ex¬ 
perienced  a  poor  childhood  were  more  likely  to  develop  an  envi¬ 
ronmental  uncertainty  that  led  to  a  lower  sense  of  control;  whereas, 
those  from  wealthier  childhoods  were  less  impulsive  and  reported 
having  more  control  over  their  environment.  This  suggests  that 
students’  perception  of  control  is  shaped  by  contextual  factors  as 
well  as  personal  beliefs,  a  concept  that  warrants  examination  with 
respect  to  students’  causal  thinking. 

Given  the  contextual-dependence  of  the  PCAP  beliefs,  further 
studies  may  examine  the  influence  of  other  types  of  events  such  as 
failure  and  successful  events,  as  well  as  differences  in  perspective 
(e.g.,  actor  vs.  observer).  Additionally,  the  present  conceptualization 
of  PCAP  involves  only  causal  attributions;  there  are,  however,  other 
attributions  that  individuals  could  perceive  themselves  as  the  “one 
who  determines”  such  as  meaning  attributions.  Dole  and  Sinatra 
(1998)  suggested  that  individuals  are  more  likely  to  engage  deeply  in 
processing  information  if  the  information  is  personally  meaningful. 
Following  an  important  event,  students  are  likely  to  engage  in  a 
cognitive  process  to  determine  whether  the  event  is  personally  mean¬ 
ingful.  Students  who  perceive  control  over  this  type  of  attribution 
would  likely  endorse  a  statement  such  as,  Ultimately,  I  m  the  one 
who  determines  if  it  is  meaningful.  Thus,  the  present  construct  could 
be  expanded  to  include  other  types  of  attributions  (e.g.,  good-bad). 
Although  the  perspective,  “I’m.  the  one  who  determines  ...  is  likely 
beneficial  in  a  multitude  of  circumstances,  PCAP  is  specific  to  causal 
attributions  to  more  clearly  address  the  perceived  control  of  an  inter¬ 
nal  process. 


Concluding  Remarks 

The  present  study  examined  students’  attribution-related  control 
beliefs  and  found  that  those  who  perceived  the  capability  to  influence 
their  attributions  and  were  aware  of  the  motivational  consequences  of 
their  attributions  were  more  likely  to  experience  favorable  motiva¬ 
tional  outcomes.  Even  in  situations  of  uncontrollability,  those  who 
perceived  control  over  their  attribution  process  retained  a  sense  of 
autonomy,  viewing  themselves  as  the  “one  who  determines  why 
things  happen.”  Conceptually,  students  who  adopt  these  beliefs  use 
them  to  meta-cognitively  position  themselves  to  help  withstand  the 
onslaught  of  uncertainty  and  distress  that  all  students  inevitably  en¬ 
counter.  Perceiving  oneself  as  having  control  over  a  cognitive  process, 
especially  when  perceived  control  is  threatened,  is  an  advantageous 
perspective  that  has  the  potential  to  safeguard  students’  well-being 
and  promote  the  experience  of  autonomy. 
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Meta- Analytic  Evidence  From  Multiple  Educational  Outcomes 
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Achievement  goal  theory  originally  defined  performance-approach  goals  as  striving  to  demonstrate 
competence  to  outsiders  by  outperforming  peers.  The  research,  however,  has  operationalized  the  goals 
inconsistently,  emphasizing  the  competence  demonstration  element  in  some  cases  and  the  peer  compar¬ 
ison  element  in  others.  A  meta-analysis  by  Hulleman  et  al.  (2010)  discovered  that  students’  academic 
achievement  was  negatively  predicted  by  performance-approach  goals  that  focus  on  appearing  talented, 
but  positively  predicted  by  performance-approach  goals  that  focus  on  outperforming  peers.  The  present 
meta-analysis  extends  that  pattern  to  numerous  other  educational  outcomes,  such  as  competence  per¬ 
ceptions  and  self-regulation.  It  does  so  while  also  removing  a  confound  (i.e.,  the  sample’s  mean  age)  that 
varies  systematically  along  with  the  type  of  performance-approach  goal  measure  employed  in  studies. 
Discussion  explores  when  and  why  the  2  types  of  performance-approach  goals  are  most  likely  to  diverge 
versus  converge.  It  also  considers  2  potential  directions  that  goal  theory  can  take  to  incorporate  the  2 
performance-approach  goals. 

Keywords:  achievement  goals,  meta-analysis,  performance-approach  goals 


What  motivates  students?  Are  some  motivations  better  than 
others?  If  so,  what  can  be  done  to  promote  them?  Achievement 
goal  theory  traces  the  answers  to  students’  goals  (Dweck,  1986; 
Nicholls,  1984).  It  contrasts  mastery  goals  and  performance  goals, 
each  reflecting  a  unique  reason  for  engaging  in  a  task.  Students 
pursuing  mastery  goals  strive  to  develop  competence  by  maximiz¬ 
ing  their  potential,  improving  on  prior  success,  or  simply  learning 
to  heart’s  content.  Those  pursuing  performance  goals,  by  contrast, 
strive  to  demonstrate  existing  competence,  typically  by  outper¬ 
forming  peers  or  by  matching  their  success  with  less  effort.  Each 
goal  is  approach-based,  meaning  they  strive  toward  acquiring 
success.  Theorists  later  added  to  each  goal  an  avoidance  counter¬ 
part  that  strives  to  avoid  failure:  mastery-avoidance  goals  that  aim 
to  prevent  a  decline  in  skill  or  a  failure  to  learn,  and  performance- 
avoidance  goals  that  aim  to  prevent  a  display  of  low  competence 
relative  to  others  (Elliot  &  Harackiewicz,  1996;  Elliot  & 
McGregor,  2001;  Middleton  &  Midgley,  1997;  Skaalvik,  1997; 
Vandewalle,  1997). 

In  testing  achievement  goal  theory,  most  studies  have  corre¬ 
lated  students’  self-reported  goals  with  their  achievement  or 
other  learning-related  outcomes.  Each  goal’s  effects  are  well- 
documented  in  several  reviews  (e.g.,  Moller  &  Elliot,  2006; 
Senko,  Hulleman,  &  Harackiewicz,  2011)  and  meta-analyses  (Ba- 
ranik,  Stanley,  Bynum,  &  Lance,  2010;  Cellar  et  al.,  201 1 ;  Huang, 
2011a,  2011b;  Hulleman,  Schrager,  Bodmann,  &  Harackiewicz, 
2010;  Lochbaum  &  Gottardy,  2015;  Payne,  Youngcourt,  & 
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Beaubien,  2007;  Van  Yperen,  Blaga,  &  Postmes,  2014,  2015; 
Wirthwein,  Sparfeldt,  Pinquart,  Wegerer,  &  Steinmayr,  2013). 
Mastery-approach  (MAp)  goals  provide  a  wide  array  of  benefits — 
for  example,  positive  emotions  (e.g.,  positive  affect  and  hope)  and 
interest,  elaborative  learning  strategies  and  effective  self¬ 
regulation,  and  help-seeking  and  cooperativeness,  to  list  just  a  few. 
The  two  avoidance  goals — mastery-avoidance  (MAv)  and 
performance-avoidance  (PAv) — have  shown  sporadic  benefits 
too.  For  example,  some  studies  have  found  that  MAv  boost  interest 
(see  Baranik  et  al.,  2010),  and  other  recent  studies  suggest  that 
avoidance  goals  may  prove  beneficial  in  late  adulthood  (Senko  & 
Freund,  2015)  or  for  East  Asian  students  (King,  2016).  Notwith¬ 
standing  those  limited  benefits,  however,  both  of  the  avoidance 
goals  have  proven  maladaptive  overall,  each  consistently  produc¬ 
ing  a  wide  array  of  detrimental  outcomes,  such  as  negative  emo¬ 
tions  (e.g.,  anxiety  and  hopelessness),  poor  learning  strategies, 
unwillingness  to  seek  help,  poor  health,  openness  to  cheating,  and 
so  forth. 

Performance-approach  (PAp)  goal  effects,  by  contrast,  produce 
highly  inconsistent  and  even  contradictory  effects:  for  example, 
anxiety  and  negative  affect,  but  also  pride  and  positive  affect  (see 
Huang,  201  la);  self-consciousness  (e.g.,  Heintz  &  Steele-Johnson, 
2004),  but  also  task  focus  (e.g.,  Lee,  Sheldon,  &  Turban,  2003); 
effort-withdrawal  and  self-handicapping  (e.g.,  Midgley,  Arunku- 
mar,  &  Urdan,  1996),  but  also  high  effort  intensity  and  challenge¬ 
seeking  (see  Senko,  Durik,  Patel,  Lovejoy,  Valentiner,  &  Stang, 
2013);  “shallow”  learning  strategies  \le.,  rehearsal),  but  also 
“deep”  strategies  (i.e.,  elaboration)  and  self-regulation  (see  Payne 
et  al.,  2007).  Perhaps  the  PAp  goal’s  most  consistent  effect— and 
certainly  its  most  controversial  (Brophy,  2005;  Harackiewicz,  Bar¬ 
ron,  &  Elliot,  1998;  Midgley,  Kaplan,  &  Middleton,  2001;  Senko 
et  al.,  2011) — has  been  on  academic  achievement.  PAp  goals 
positively  predict  achievement  (Hulleman  et  al.,  2010).  What 
explains  this  bewildering  pattern  for  PAp  goals? 
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Recent  research  suggests  that  it  may  trace  to  inconsistencies  in 
how  researchers  define  PAp  goals  (Hulleman  et  al.,  2010;  Senko  & 
Tropiano,  2016).  The  remainder  of  this  paper  further  probes  this 
possibility.  We  present  a  meta-analysis  that  tests  whether  PAp  goal 
effects  on  various  educational  outcomes  depend  on  how  the  goal  is 
operationally  defined.  Note  that  we  shall  narrow  our  focus  solely 
onto  PAp  goals,  for  they  are  the  ones  that  have  produced  mixed 
effects  and  spurred  controversy.  MAp,  MAv,  and  PAv  goals,  by 
contrast,  all  provide  largely  consistent  results,  their  only  notewor¬ 
thy  controversy  being  about  the  prevalence  of  MAv  goals  (Hulle¬ 
man  &  Senko,  2010). 

The  Performance  Goal’s  Essence:  Appearing  Talented 
or  Outperforming  Others? 

Achievement  goal  theory  is  now  more  than  30  years  old,  yet 
theorists  disagree  over  what  exactly  is  a  PAp  goal  (Grant  & 
Dweck,  2003;  Kaplan  &  Maehr,  2007;  Urdan,  1997;  see  Elliot, 
2005,  for  a  historical  review).  Is  its  essence  to  demonstrate  com¬ 
petence?  Or  to  outperform  peers? 

Goal  theory’s  founders  emphasized  competence  demonstration 
(Dweck,  1986;  Maehr,  1984;  Nicholls,  1984).  From  their  view,  an 
achievement  goal  represents  the  broad  reason  for  engaging  in  an 
achievement  task:  either  to  develop  competence  (MAp  goals)  or  to 
demonstrate  competence  (PAp  goals).  But  they  also  featured  peer 
comparison  in  their  definitions  to  varying  degrees.  This  practice 
reflects  an  assumption  that  the  twin  desires  to  appear  competent 
and  to  outperform  peers  naturally  cohere;  arousing  one  will  prime 
the  other.  For  example,  contexts  emphasizing  competition  may 
prompt  students  to  strive  to  demonstrate  high  ability  compared 
with  others  (Ames,  1992),  and,  conversely,  students  eager  to 
showcase  high  ability  may  try  to  do  so  by  outperforming  others 
(Nicholls,  1984).  The  early  research  therefore  defined  PAp  goals 
with  either  or  both  of  these  two  elements,  tacitly  assuming  they 
should  produce  the  same  effects.  Still,  according  to  the  goal 
theory’s  initial  framework,  called  the  “goal  orientation  model,” 
competence  demonstration  takes  the  lead  role  in  this  partnership, 
and  it  should  create  a  maladaptive  orientation  to  the  task — one 
marked  by  self-consciousness,  anxiety,  and  challenge-avoidance. 

Elliot  (Elliot  &  Thrash,  2001)  later  offered  a  rival  framework 
called  the  “goal  standard  model.”  In  line  with  classic  theories  of 
goals,  he  proposed  that  an  achievement  goal  must  emphasize 
competence  attainment,  defined  either  with  personal  standards 
such  as  improving  on  prior  success  (MAp  goals)  or  with  interper¬ 
sonal  standards  (PAp  goals).  From  that  view,  outperforming  peers 
is  the  PAp  goal’s  true  essence.  Equally  important,  students  may 
have  various  motives  for  trying  to  outperform  peers.  One  motive 
might  be  to  showcase  their  talent,  much  like  the  goal  orientation 
model  assumes.  But  students  might  also  strive  to  outperform  peers 
for  other  reasons  unrelated  to  impression  management,  such  as  the 
enjoyment  of  the  challenge  (Urdan  &  Mestas,  2006;  Vansteenk- 
iste,  Lens,  Elliot,  Soenens,  &  Mouratidis,  2014).  So,  according  to 
the  goal  standard  model,  we  cannot  assume  that  outperforming 
others  and  appearing  talented  are  interchangeable  desires.  They 
may  overlap  but  still  are  conceptually  distinct. 

It  appears  they  may  be  empirically  distinct  too  (Edwards,  2014; 
Hackel,  Jones,  Camonneau,  &  Mueller,  2016;  Senko  &  Tropiano, 
2015;  Warburton  &  Spray,  2014).  Hulleman  et  al.’s  (2010)  dem¬ 
onstrated  this  best  in  a  meta-analysis  of  achievement  goal  corre¬ 


lations  with  academic  achievement  and  interest.  Their  sweep  of 
nearly  100  studies  identified  numerous  PAp  goal  measures,  some 
widely  used  (e.g.,  Button,  Mathieu,  &  Zajac,  1996;  Elliot  & 
McGregor,  2001;  Midgley  et  al.,  2001;  Vandewalle,  1997)  and 
others  custom-made  for  specific  studies.  Hulleman  et  al.  coded  the 
thematic  content  of  every  item  in  each  measure,  and  then  classified 
each  goal  measure  according  to  its  dominant  theme.  In  some  PAp 
goal  measures,  the  predominant  theme  is  to  appear  talented  (e.g., 
Vandewalle,  1997).  But  in  others,  it  is  to  outperform  peers  (e.g., 
Elliot  &  Church,  1997).  This  distinction  matters.  In  Hulleman  et 
al.’s  meta-analysis,  PAp  goals  predicted  low  academic  achieve¬ 
ment  (r  =  —.14)  when  the  dominant  theme  was  to  appear  talented, 
but  they  predicted  achievement  gains  ( r  =  .14)  when  the  dominant 
theme  was  to  outperform  others.  Given  their  potential  differences, 
we  will  refer  to  these  types  of  PAp  goals  as  “appearance  goals” 
and  “normative  goals”  for  the  remainder  of  the  paper. 

Though  revealing,  Hulleman  et  al.’s  (2010)  meta-analysis  is 
limited  in  two  ways  that  necessitate  follow-up:  (a)  their  effect  was 
restricted  to  only  one  outcome  (academic  achievement),  and  (b)  it 
might  be  caused  by  an  underlying  confound.  We  elaborate  each 
issue  below  and  then  test  them  in  a  new  meta-analysis. 

Extension  1:  Do  Appearance  and  Normative  Goals 
Differ  on  Other  Outcomes? 

Hulleman  et  al.  (2010)  restricted  their  meta-analysis  to  only  two 
educational  outcomes,  academic  achievement  and  interest.  Of  the 
two,  only  achievement  showed  a  difference  between  the  appear¬ 
ance  and  normative  types  of  PAp  goals.  Does  that  finding  gener¬ 
alize  to  other  important  outcomes,  or  is  it  restricted  to  academic 
achievement?  It  is  a  critical  question.  If  the  finding  generalizes  to 
numerous  outcomes,  then  the  field  must  revisit  how  best  to  con¬ 
ceptualize  PAp  goals — a  thorny  but  necessary  task  that  may  entail 
revising  achievement  goal  theory.  If,  however,  the  finding  is 
confined  to  academic  achievement,  such  efforts  may  be  unneces¬ 
sary. 

Academic  achievement  clearly  ranks  high  in  importance  for  its 
power  to  catalyze  student  motivation  and  its  role  as  a  gateway  to 
new  opportunities.  Yet  it  may  also  be  a  flawed  indicator  of 
learning — or  at  least  long-term  quality  of  learning — because 
course  assignments  sometimes  demand  only  superficial  topic 
knowledge  (Kohn,  2000).  High  performance  can  therefore  occur 
without  clear  learning  (i.e.,  relatively  permanent  change  in  behav¬ 
ior  or  knowledge),  and,  worse,  genuine  gains  in  learning  can  go 
undetected  by  performance  measures  (Soderstrom  &  Bjork,  2015). 
Accordingly,  one  might  argue  that,  although  normative  goals  fa¬ 
cilitate  achievement,  they  fail  to  aid  learning  beyond  transient  and 
superficial  levels  (e.g.,  Midgley  et  al.,  2001).  Perhaps,  then,  any 
benefit  of  normative  goals  over  appearance  goals  is  unique  to 
achievement.  The  two  PAp  goals  might  produce  equally  weak  (or 
negative)  effects  on  other  outcomes  intimately  linked  to  learning, 
such  as  self-regulation  or  “deep”  study  strategies  that  entail  elab¬ 
orating  or  evaluating  course  concepts.  Of  course,  the  other  possi¬ 
bility  is  that  normative  goals  also  aid  learning.  After  all,  academic 
achievement,  despite  its  flaws,  is  facilitated  by  many  learning- 
related  outcomes  widely  hailed  in  the  field,  such  as  self-efficacy 
and  self-regulation  (Crede  &  Kuncel,  2008;  Robbins  et  al.,  2004), 
and  hindered  by  others  widely  denounced,  such  as  self-handicapping 
(Schwinger,  Wirtheim,  Lemmer,  &  Steinmayr,  2014).  Perhaps,  then, 
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normative  goals,  more  so  than  appearance  goals,  facilitate  a  relatively 
adaptive  nomological  network  of  thoughts,  emotions,  and  behaviors 
that  support  learning  as  well  as  achievement.  There  is  clear  need  to 
compare  the  two  PAp  goals’  effects  on  other  educational  outcomes 
besides  achievement. 

The  present  meta-analysis  does  this.  We  relied  mostly  on  prior 
meta-analyses  (Baranik  et  al.,  2010;  Cellar  et  al.,  2011;  Huang, 
2011a;  Payne  et  al.,  2007)  to  identify  the  outcome  variables  for 
inclusion:  competence  perceptions,  various  studying-related  strat¬ 
egies,  and  several  positive  and  negative  emotions,  all  of  which  are 
detailed  below.  This  list  necessarily  excludes  several  other  impor¬ 
tant  outcomes  that  goal  researchers  have  begun  to  test  in  recent 
years,  most  notably  social  ones  (e.g.,  peer  relationship  quality, 
feelings  of  belongingness),  moral  ones  (e.g.,  tolerance  of  cheat¬ 
ing),  or  health  ones  (e.g.,  burnout).  For  each,  we  found  too  few 
studies  to  allow  a  viable  test  of  whether  they  are  affected  differ¬ 
ently  by  appearance  and  normative  PAp  goals. 

Competence  perceptions.  Competence  perceptions ,  which 
have  always  played  a  pivotal  role  in  achievement  goal  theory 
(Dweck,  1986;  Elliot,  1999;  Nicholls,  1984),  are  well-known  to 
facilitate  task  performance  and  related  processes  (Valentine, 
DuBois,  &  Cooper,  2004).  Past  research  shows  they  typically  have 
moderately  sized,  positive  links  with  PAp  goals  (see  Baranik  et  al., 
2010;  Cellar  et  al.,  2011). 

Yet  the  competence  perception  construct  takes  many  forms.  One 
well-known  distinction  is  between  relatively  stable  judgments  of 
one’s  ability  (e.g.,  academic  self-concept;  Marsh,  1990)  versus 
task-specific  expectations  (e.g.,  self-efficacy;  e.g.,  Pajares,  1996). 
Those  two  perceptions  often  correlate  strongly  (e.g.,  Bong  & 
Skaalvik,  2003),  making  their  distinction  unlikely  to  matter  here. 
One  other  difference  in  competence  perceptions  may  prove  impor¬ 
tant  for  PAp  goals,  however.  For  general  ability  judgments  or 
task-specific  expectancies  alike,  some  measures  encourage  respon¬ 
dents  to  compare  themselves  to  peers  (e.g.,  Marsh,  1990;  Pintrich 
&  De  Groot,  1990),  whereas  other  measures  refer  solely  to  the  task 
(e.g.,  Elliot  &  Church,  1997;  Midgley  et  al.,  2001).  Normative 
goals  might  correlate  more  strongly  with  the  former  because  of 
their  shared  emphasis  on  social  comparison.  The  more  pressing 
issue,  though,  is  whether  normative  goals  have  different  effects 
than  appearance  goals  on  both  measures  of  competence  percep¬ 
tions. 

Study  strategies.  Achievement  goals  have  long  been  assumed 
to  trigger  different  strategies  for  learning  and  studying  (e.g.,  Pin¬ 
trich,  1999;  Wolters,  Yu,  &  Pintrich,  1996).  These  include  deep 
and  surface  strategies,  among  others.  Deep  strategies  entail  sum¬ 
marizing  and  elaborating  concepts,  generating  personal  examples, 
creating  analogies,  asking  questions,  evaluating  theories,  and  so 
forth.  Surface  strategies  entail  passive  note-taking  and  rote  mem¬ 
orization.  PAp  goals  predict  both  strategies  positively,  yet  more 
strongly  so  for  the  surface  ones  (see  Payne  et  al.,  2007).  As  with 
achievement  goals,  however,  theorists  disagree  about  how  best  to 
conceptualize  those  surface  strategies  (see  Pintrich,  2004).  Some 
frame  them  as  maladaptive  strategies  done  out  of  work-avoidance 
or  confusion  or  extrinsic  motivation,  thus  being  incompatible  with 
deep  strategies  and  quality  learning  (e.g.,  Biggs,  1993;  Entwistle  & 
McCune,  2004).  Their  inventories  reflect  this  assumption  (e.g.,  “I 
learn  some  things  by  rote,  going  over  and  over  them  until  I  know 
them  by  heart  even  if  I  do  not  understand  them”;  Biggs,  Kember, 
&  Leung,  2001).  Other  theorists  instead  consider  the  surface  and 


deep  strategies  to  be  independent  or  even  overlapping,  each  po¬ 
tentially  supportive  of  learning  (e.g.,  Pintrich  &  De  Groot,  1990). 
Their  surface  strategy  measures  therefore  emphasize  only  re¬ 
hearsal,  stripped  of  any  underlying  motives  (e.g.,  “I  memorize  key 
words  to  remind  me  of  important  concepts  in  this  class”;  Pintrich, 
Smith,  Garcia,  &  McKeachie,  1993).  Given  these  opposing  view¬ 
points,  our  meta-analysis  will  distinguish  the  maladaptive  surface 
strategies  favored  by  the  first  perspective  from  the  adaptive  sur¬ 
face  strategies  favored  by  the  second  perspective. 

Student  learning  and  achievement  are  also  aided  by  metacogni¬ 
tion  and  self-regulation ,  the  process  of  monitoring  one’s  own  goal 
progress,  identifying  knowledge  gaps,  and  changing  strategies  if 
necessary  (Winne,  1995).  Whereas  MAp  goals  are  often  touted  for 
facilitating  self-regulation,  PAp  goals  are  not  (e.g.,  Middleton  & 
Midgley,  1997;  Wolters,  2004).  The  one  prior  meta-analysis  of 
PAp  goals  and  self-regulation  found  no  link  between  them  (Cellar 
et  al.,  2011). 

Students  may  also  engage  in  more  socially  oriented  study  strat¬ 
egies  that  harm  or  aid  their  learning  and  performance.  One  adap¬ 
tive  strategy  is  help-seeking,  which  is  typically  unrelated  to  PAp 
goals  (e.g.,  Ryan  &  Pintrich,  1997).  Two  others  are  maladaptive 
strategies  done  largely  out  of  fear  of  being  judged  incompetent. 
The  first  is  help-avoidance.  The  other  is  self-handicapping:  to 
claim  or  actually  produce  handicaps  to  success,  thereby  allowing 
an  external  attribution  in  case  one  performs  poorly  (Rhodewalt, 
1990).  Several  studies  link  PAp  goals  to  both  help-avoidance  (e.g., 
Ryan  &  Pintrich,  1997)  and  self-handicapping  (see  Urdan  &  Midg¬ 
ley,  2001). 

Emotions.  Achievement  goals  should  affect  learning  in  part 
through  the  emotions  they  evoke  during  task  engagement  (Linnen- 
brink  &  Pintrich,  2002;  Pekrun,  Elliot,  &  Maier,  2006).  Many 
studies  indicate  positive  yet  mild  links  between  PAp  goals  and 
positive  emotions,  most  notably  task  enjoyment  and  generalized 
positive  affect.  However,  PAp  goals  also  have  positive  and  mild 
links  with  negative  emotions,  most  notably  anxiety  and  negative 
affect  (for  meta-analyses,  see  Baranik  et  al.,  2010,  and  Huang, 
2011a).  The  present  meta-analysis  tests  whether  these  seemingly 
contradictory  patterns  are  partly  explained  by  the  two  types  of  PAp 
goals  having  different  relationships  with  positive  and  negative 
emotions.  Other  emotions  (e.g.,  pride,  shame,  and  boredom)  were 
excluded  because  only  a  few  eligible  studies  tested  them,  nearly  all 
using  normative  PAp  goals  (see  Huang,  2011a,  for  a  summary). 

Extension  2:  Is  Sample  Age  Confounded  With 
PAp  Type? 

The  present  meta-analysis  extends  Hulleman  et  al.’s  (2010)  in 
another  way,  too.  One  limitation  of  any  meta-analysis  is  that, 
because  it  cannot  standardize  every  feature  of  the  studies  being 
compared,  the  predictor  variable  (e.g.,  PAp  goals)  can  become 
confounded  with  other  methodological  features  that  also  vary 
systematically  between  those  studies.  Ih  acknowledging  this,  Hul¬ 
leman  et  al.  noted  that  the  goal  measures  used  by  researchers  are 
slightly  confounded  with  the  age  of  students  tested.  Research  with 
young  samples  (i.e.,  elementary  and  middle-school)  typically  have 
assessed  appearance  PAp  goals,  whereas  research  with  older  sam¬ 
ples  have  used  an  even  mix  of  the  two  types  of  PAp  goals.  Might 
this  confound  explain  the  two  goals’  different  effects?  Theorists 
have  long  posited  that  younger  students,  owning  a  more  fragile 
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sense  of  self  and  being  less  accustomed  to  ability  evaluation  or 
social  comparison,  are  more  vulnerable  to  the  mild  anxiety  and 
other  potential  costs  of  PAp  goals  (e.g.,  Dweck,  1986;  Midgley  et 
al„  2001).  If  true,  then  either  type  of  PAp  goal  would  produce 
more  harmful  effects  with  younger  samples,  in  which  case  the 
meta-analytic  difference  between  the  two  goals  could  be  a  mirage. 
We  are  unaware  of  any  direct  tests  of  this  possibility.  But  four 
prior  meta-analyses  have  tested  whether  the  PAp  goal’s  link  with 
achievement  depends  on  students’  age  or  other  demographics; 
three  show  the  link  prevails  unabated  across  different  age  groups, 
ethnicities  and  nationalities  (Huang,  201  lb;  Hulleman  et  al„  2010; 
Van  Yperen  et  al.,  2014),  whereas  another  found  the  effect  to  be 
positive  with  college  student  samples  but  null  with  elementary  and 
middle-school-age  children  (Wirthwein,  Sparfeldt,  Pinquart, 
Wegerer,  &  Steinmayr,  2013).  None  disentangled  age  from  the 
different  types  of  PAp  goal  measures,  however.  The  present  meta¬ 
analysis  will  do  so. 

Overview  of  Current  Meta-Analysis 

In  sum,  the  current  meta-analysis  extends  Hulleman  et  al.’s 
(2010)  in  two  ways.  One  is  to  test  how  well  their  findings  gener¬ 
alize  to  numerous  learning-relevant  outcomes.  No  prior  meta¬ 
analysis  has  done  so.  The  other  is  to  test  whether  any  differences 
between  PAp  goal  subtypes  are  attributable  to  an  underlying  age 
confound.  These  two  improvements  allow  clearer  insight  into  how 
much  normative  and  appearance  PAp  goals  truly  differ  in  predic¬ 
tive  validity — an  essential  foundation  for  any  eventual  discussion 
within  the  field  about  whether  to  reconsider  the  PAp  goal  con¬ 
struct. 

This  meta-analysis  is  largely  exploratory,  but  there  are  sensible 
grounds  for  expecting  the  two  PAp  goals  to  impact  some  outcomes 
differently.  If  the  existing  link  between  normative  goals  and  aca¬ 
demic  achievement  reflects  quality  learning,  those  goals  may  also 
have  stronger  links  than  appearance  goals  with  outcomes  that  most 
directly  facilitate  achievement — in  particular,  competence  percep¬ 
tions  (see  Crede  &  Kuncel,  2008)  and  adaptive  learning  strategies, 
especially  self-regulation  and  deep  learning  strategies  (see  Rob¬ 
bins  et  al.,  2004).  Conversely,  appearance  goals  may  have  stronger 
links  than  normative  goals  with  self-handicapping  and  help- 
avoidance,  both  of  which  are  driven  by  self-presentation  concerns 
(Rhodewalt,  1990;  Ryan  &  Pintrich,  1997).  For  the  other  out¬ 
comes — adaptive  or  maladaptive  surface  learning  strategies,  help¬ 
seeking,  and  positive  or  negative  emotions — there  is  no  clear-cut 
theoretical  reason  for  hypothesizing  stronger  effects  of  one  PAp 
goal  over  the  other. 

Method 

Sample  of  Studies 

We  used  the  PsycINFO,  ERIC,  and  Academic  Search  Complete 
databases  and  restricted  our  search  to  articles  published  prior  to 
August  1,  2014  in  English-speaking,  peer-reviewed  journals.  Ex¬ 
cluding  unpublished  studies  may  of  course  bias  results  toward 
stronger  effects,  which  have  greater  odds  of  being  published  (i.e., 
the  “file  drawer  problem”;  Rosenthal,  1979).  We  will  explore  this 
issue  later  in  the  paper. 


This  meta-analysis  includes  a  wide  range  of  educational  out¬ 
comes,  all  measured  by  student  self-report.  One  is  competence 
perceptions,  which  has  strong  facilitative  links  to  academic 
achievement  and  interest  (Robbins  et  al.,  2004).  Seven  outcomes 
are  specific  strategies  that  may  either  aid  learning  (i.e.,  self¬ 
regulation,  deep  learning  strategies,  adaptive  surface  strategies, 
and  help-seeking)  or  hinder  learning  (i.e.,  maladaptive  surface 
strategies,  self-handicapping,  and  help-avoidance).  The  remaining 
four  outcomes  are  affective,  including  two  that  are  broad  in  scope 
(i.e.,  positive  affect  and  negative  affect)  and  two  that  are  specific 
emotions  (i.e.,  enjoyment  and  anxiety). 

Every  search  necessitated  achievement  goal *  or  goal  orienta¬ 
tion*  in  the  text,  and  then  was  narrowed  to  include  keywords 
pertaining  to  the  relevant  outcome  (see  the  Appendix  for  the  full 
list  of  keywords).  For  example,  the  competence  perception  key¬ 
words  included  “perceived  competenc*”,  “competence  ex- 
pectanc*”,  “self-efficacy”,  and  other  variants).  For  the  emotion 
outcomes,  we  supplemented  this  search  strategy  with  published 
articles  listed  in  Huang’s  (2011a)  meta-analysis  of  goals  and 
emotions,  because  our  initial  search  provided  a  low  harvest  due  to 
emotions  seldom  being  of  primary  importance  in  achievement  goal 
studies. 

Eligibility  Criteria 

Studies  were  further  screened  for  three  requirements  for  inclu¬ 
sion.  First,  papers  must  include  zero-order  correlations  between 
PAp  goals  and  at  least  one  of  the  outcomes.  Second,  papers  must 
include  sample  sizes  to  appropriately  weight  their  effect  sizes 
(Lipsey  &  Wilson,  2001).  Third,  we  must  have  access  to  the 
complete  PAp  goal  measure.  Most  studies  used  established  goal 
measures,  in  which  case  they  were  included  so  long  as  the  version 
of  the  measure  was  clear.  When  studies  used  customized  measures 
and  provided  only  sample  items,  we  contacted  the  authors  to 
request  the  full  measure. 

Final  Sample  of  Studies 

This  meta-analysis  included  296  studies,  cumulating  in  314 
independent  samples  totaling  115,250  participants  (56%  female). 
Most  samples  were  from  North  America  ( k  =  169),  followed  by 
Europe  ( k  =  89),  Asia  (k  =  45),  the  Middle  East  (k  =  9),  and 
South  America  ( k  =  2).  Most  also  were  tested  in  educational 
settings  ( k  =  253),  with  the  remaining  in  sport  (k  =  45)  or 
occupational  (k  =  16)  settings. 

Sample  age  is  an  important  control  variable  for  the  moderator 
analyses  described  later.  Most  papers  provided  it.  Others  instead 
provided  the  grade  level,  which  we  used  to  estimate  sample  age 
based  on  established  grade-age  norms.  In  cases  where  the  sample 
comprised  a  range  of  grades  but  the  reported  effects  were  aggre¬ 
gated  across  the  sample  (e.g.,  7th  through  12th  grades;  Murayama 
&  Elliot,  2009),  we  used  the  median  grade  level  as  a  conservative 
estimate  of  the  sample’s  age  (across  all  studies,  M  =  18.5  years, 
SD  =  6.3).  Most  studies  requiring  that  strategy  were  of  college  or 
adult  samples;  very  few  studies  of  younger  students  needed  it. 

Goal  Coding 

Classifying  individual  items.  The  set  of  studies  in  this  meta¬ 
analysis  included  48  unique  PAp  goal  measures.  Using  Hulleman 


578 


SENKO  AND  DAWSON 


et  al.’s  (2010)  typology,  we  coded  the  thematic  content  of  each 
item  in  every  PAp  goal  measure.  Appearance  items  emphasize 
appearing  intelligent  to  others  (e.g.,  “One  of  my  goals  is  to  show 
others  that  I’m  good  at  my  class  work”;  Midgley  et  al.,  2000). 
Nonnative  items  emphasize  only  interpersonal  success  (e.g.,  “It  is 
important  for  me  to  do  better  than  other  students”;  Elliot  & 
McGregor,  2001).  Some  items  combined  the  appearance  and  nor¬ 
mative  elements  (e.g.,  “It’s  important  to  me  that  I  look  smart 
compared  with  others  in  my  class”;  Midgley  et  al.,  2000).  In  those 
cases,  we  followed  Hulleman  et  al.’s  (2010)  protocol  of  assigning 
them  to  the  PAp-Appearance  category  because,  based  on  the  logic 
of  the  goal  orientation  model,  the  normative  standard  is  pursued  in 
service  of  the  overarching  goal  to  appear  talented. 1  Finally,  Gen¬ 
eral  items  capture  themes  other  than  outperforming  others  or 
appearing  talented.  In  most  cases,  the  item  focused  on  obtaining  a 
desirable  grade  or  outcome  (e.g.,  “The  main  reason  I  do  my  work 
in  science  is  because  we  get  grades”;  Andaman,  Griesinger,  & 
Westerfield,  1998).  In  other  cases,  it  focused  on  avoiding  chal¬ 
lenge  or  uncertainty  (e.g.,  “I  like  to  be  fairly  confident  that  I  can 
successfully  perform  a  task  before  I  attempt  it”;  Button  et  al., 

1996) .  All  coding  of  items  was  done  by  the  two  authors,  who 
remained  blind  to  specific  findings  while  coding.  Agreement  rates 
were  high  across  the  223  PAp  goal  items  (91%;  Cohen’s  k  = 
87%).  Discrepancies  were  resolved  through  discussion. 

Our  coding  scheme  does  depart  from  Hulleman  et  al.’s  (2010)  in 
one  way.  They  differentiated  between  “goal”  items  and  “no  goal” 
items.  Goals  represent  a  future-based  and  competence-relevant 
endpoint  that  one  either  strives  toward  or  away  from  (Elliot,  2005). 
Many  PAp  goal  measures  include  goal  language  by  beginning 
statements,  “My  goal  is  to.  .  .  .,”  or  “One  of  my  aims  is  to.  .  .  .,” 
or  “It  is  important  to  me  to.  .  .  .”  or  “I  am  trying.  .  .  .”  Other 
measures  lack  goal  language  and  instead  emphasize  affective  en¬ 
gagement  (e.g.,  “I  feel  successful  at  school  when  I  do  the  work 
better  than  other  students”,  italics  added  for  emphasis;  Skaalvik, 

1997)  or  a  preference  between  two  contrasting  options  (e.g.,  “I 
prefer  to  do  things  that  I  can  do  well  rather  than  things  that  I  do 
poorly”,  italics  added  for  emphasis;  Button  et  al.,  1996).  Such 
items  were  coded  as  No  Goal  by  Hulleman  et  al.  (2010).  When 
adopting  their  approach  in  our  preliminary  analyses,  the  No  Goal 
category  accounted  for  28%  of  all  PAp  goal  items.  We  considered 
this  high  rate  unsatisfying  because  those  items,  despite  lacking 
goal  language,  often  still  include  clear  themes  relevant  to  the  three 
PAp  goal  subtypes.  For  example,  the  item  above  by  Skaalvik 
(1997)  emphasizes  normative  success,  whereas  the  one  by  Button 
et  al.  (1996)  emphasizes  challenge-avoidance  (i.e.,  a  general  goal) 
rather  than  competence  demonstration  or  normative  success.  To 
mix  such  items  into  a  separate  No  Goal  category  ignores  those 
thematic  distinctions,  making  this  category  too  broad  for  mean¬ 
ingful  comparison  in  analyses.  Therefore,  we  coded  each  item  for 
its  thematic  content,  regardless  of  whether  it  fit  the  strict  goal 
definition.  Approximately  60%  of  the  items  that  would  have  been 
classified  as  No  Goal  were  instead  classified  as  normative  (e.g.,  “I 
am  happy  only  when  I  am  one  of  the  best  in  class”;  Mclnemey, 
Yeung,  &  Mclnerney,  2001),  and  the  rest  were  reclassified  nearly 
equally  as  either  appearance  (e.g.,  “To  be  honest,  I  really  like  to 
prove  my  ability  to  others”;  Vandewalle,  1997)  or  general  (e.g., 
“Getting  a  good  grade  in  this  class  is  the  most  satisfying  thing  for 
me  right  now”;  Pintrich  et  al.,  1993).2 


Classifying  full  measures.  After  coding  all  individual  items, 
we  computed  how  much  (proportionally)  each  PAp  goal  measure 
captures  appearance,  normative,  or  general  themes.  For  example, 
Skaalvik’ s  (1997)  5-item  PAp  goal  measure  was  scored  20% 
Appearance,  80%  Normative,  and  0%  General.  Each  measure 
therefore  had  multiple  separate  Percentile  Variable  scores — one 
apiece  for  each  PAp  goal  subtype,  all  summing  to  100%. 

Then  we  also  classified  each  goal  measure  according  to  its 
predominant  theme.  Skaalvik’s  (1997)  measure,  for  example,  was 
classified  as  predominantly  normative.  If  multiple  themes  of  PAp 
goals  were  equally  represented  in  a  measure  (e.g.,  Roeser,  Midg¬ 
ley,  &  Urdan,  1996),  or  if  no  particular  subgroup  accounted  for 
more  than  50%  of  the  measure  (e.g.,  Greene,  Miller,  Crowson, 
Duke,  &  Akey,  2004),  the  measure  was  labeled  as  NoMajority. 
Table  1  provides  frequencies  and  representative  measures  for  each 
Majority  Scale. 

Analytic  Strategy 

All  effect  sizes  were  correlations  coefficients.  Following  estab¬ 
lished  procedures  (Lipsey  &  Wilson,  2001),  those  coefficients 
were  Fischer-z  transformed  to  weight  them  by  sample  size,  with 
inverse  variance  weights  used  during  data  analysis.  Correlation 
effect  sizes  reported  herein  were  reconverted  with  the  inverse  of 
the  Fisher  transformation.  All  analyses  applied  a  random  effects 
model  and  were  done  with  Lipsey  and  Wilson’s  (2001)  macros  for 
SPSS. 

Some  studies  provided  multiple  correlations  for  the  PAp  goal’s 
link  with  an  outcome  variable.  This  was  sometimes  due  to  the 
same  participants  completing  either  (a)  the  same  goal  and  outcome 
measures  multiple  times  in  a  longitudinal  study  (e.g.,  Gutman, 
2006),  (b)  the  same  goal  and/or  outcome  measure  separately  for 
different  classes  (e.g.,  Vogler  &  Bakken,  2007),  or  (c)  multiple 
measures  tapping  the  same  broad  outcome  construct,  such  as  state 
and  trait  measures  of  anxiety  (e.g.,  Elliot  &  McGregor,  1999).  In 
those  cases,  the  effect  sizes  for  each  goal  were  aggregated  into  a 
single  effect  to  preserve  data  independence  (Lipsey  &  Wilson, 
2001).  The  lone  exception  to  this  rule  was  for  the  few  studies  that 
directly  compared  the  effects  of  different  types  of  PAp  goal 
measures  (Day,  Radosevich,  &  Chasteen,  2003;  Grant  &  Dweck, 
2003;  Heidemeier  &  Bittner,  2012;  Potosky,  2010;  Smith,  Duda, 
Allen,  &  Hall,  2002),  in  which  case  the  separate  effect  sizes  were 
retained  for  analyses  because  we  wished  to  compare  the  goal 
measures  too. 


'  Hulleman  et  al.  (2010)  labeled  these  “evaluative”  goals  but  then 
aggregated  them  with  the  appearance  goals.  We  eschewed  the  “evaluative” 
goal  label  for  simplicity.  They  were  rare  in  this  meta-analysis  (24  of  223 
items)  and.  with  one  exception  (Grant  &  Dweck,  2003),  never  comprised 
the  majority  of  a  PAp  goal  measure’s  items. 

2  Although  there  are  many  NoGoal  measures,  those  measures  have  been 
used  far  less  frequently  than  Goal-based  measures.  Consequently,  for 
nearly  all  outcomes,  there  were  only  between  0  and  3  studies  for  NoGoal 
versions  of  the  different  PAp  types  (e.g.,  NoGoal-Normative).  This  pre¬ 
vents  reliable  comparisons  for  those  NoGoal  versions.  The  one  exception 
was  competence  perceptions,  for  which  there  were  over  a  dozen  studies  of 
the  NoGoal  versions  of  each  PAp  type.  The  basic  finding  described  later 
for  the  full  set  of  studies  was  duplicated  even  when  restricting  the  analysis 
to  those  NoGoal  measures. 
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Table  1 


Frequencies  of  Each  Performance-Approach  (PAp)  Goal’s  Majority  Code  Scales  in  the  Meta- Analysis 

PAp  goal  subtype 

Descriptors 

Normative 

Appearance 

General 

No  majority 

#  Measures 

26 

13 

A 

#  Samples  (K) 

Sample  items 

210 

“My  goal  in  this  class  is  to 
get  a  better  grade  than 
most  of  the  students” 
(Elliot  &  Church,  1997) 

“Doing  better  than  other 
students  in  class  is 
important  to  me” 

(Midgley  et  al.,  1998) 

65 

“One  of  my  goals  is  to  show  others 
that  I’m  good  at  my  class  work” 
(Midgley  et  al.,  2000). 

“I  try  to  figure  out  what  it  takes  to 
prove  my  ability  to  others  at 
work”  (Vandewalle,  1997) 

J 

29 

“It  is  important  for  me  to 
establish  a  good 
overall  grade-point” 
(Harackiewicz  et  al., 
2000) 

“Getting  a  good  grade  in 
this  class  is  the  most 
satisfying  thing  for  me 
right  now”  (Pintrich  et 
al.,  1993) 

10 

Representative  measures 

Elliot  and  Church  (1997) 
Midgley  et  al.  (1998) 
Skaalvik  (1997) 

Midgley  et  al.  (2000) 

Spinath  and  Steinmayr  (2012) 
Vandewalle  (1997) 

Anderman  et  al.  (1998) 
Button  et  al.  (1996) 

Pintrich  et  al.  (1993) 

Greene  et  al.  (2004) 
Roeser  et  al.  (1996) 

Note.  Samples  ( K)  refers  to  number  of  independent  samples  in  the  entire  meta-analysis,  and  it  varies  between  each  outcome  tested.  Measures  categorized 
as  No  Majority  include  a  combination  of  the  three  themes — normative,  appearance,  or  general — without  any  theme  the  most  dominant. 


Results 

Overall  Effects 

Analyses  proceeded  in  two  phases.  The  first  examined  overall 
effect  sizes  for  the  relationships  (i.e.,  zero-order  correlations) 
between  PAp  goals  and  each  outcome  variable.  Table  2  provides 
the  PAp  goal’s  overall  effect  sizes  (weighted  r)  and  corresponding 
z  scores  and  significance  levels,  and  95%  confidence  intervals 
(Cl).  Following  Cohen’s  (1992)  guidelines,  these  effect  sizes  can 
be  considered  small  at  r  =  .10,  medium  at  r  =  .30,  and  large  at  r  > 
.50. 


PAp  goals  predicted  all  outcomes  except  help-seeking.  Some  of 
their  effects  are  undesirable:  PAp  goals  significantly  predicted 
maladaptive  surface  strategies  (r  =  .14),  self-handicapping  (r  = 
.07),  and  help-avoidance  (r  =  .11),  plus  negative  affect  (r  =  .11) 
and  anxiety  (r  =  .13).  Other  effects,  in  contrast,  are  desirable:  PAp 
goals  significantly  predicted  general  competence  perceptions  (r  = 
.19),  self-regulation  ( r  =  .10),  and  deep  learning  (r  =  .15)  and 
adaptive  surface  strategies  (r  =  .21)  strategies,  as  well  as  positive 
affect  ( r  =  .11)  and  task  enjoyment  (r  =  .15).  Note  as  well  that 
PAp  goals  predicted  both  types  of  competence  measures,  whether 
in  reference  to  the  task  (r  =  .15)  or  in  comparison  with  peers  (r  = 
.28),  though  the  latter  effect  is  stronger.  Putting  these  assorted 


Table  2 


Overall  Effects  of  Performance-Approach  Goal  on  All  Outcomes 


Outcome 

K 

Total  N 

Weighted  r 

z  score 

95%  CI-LB 

95%  CI-UB 

Qw 

Fail  safe  N 

Competence  perceptions 

204 

83,256 

49*** 

15.13 

.17 

.22 

2528.11*** 

7,692 

Relative  to  task 

132 

42,114 

.15*** 

10.56 

.12 

.18 

1146.80*** 

3,873 

Relative  to  peers 

72 

41,142 

14.16 

.24 

.32 

956.12*** 

4,128 

Learning  strategies 

Self-regulation 

39 

14,408 

J  Q**# 

3.33 

.04 

.16 

461.38*** 

745 

Deep  strategy 

48 

17,605 

.15*** 

6.71 

.11 

.19 

367.64*** 

1,408 

Adaptive  surface  strategy 

27 

8,090 

2 1  ### 

6.14 

.14 

.27 

215.41*’* 

1,133 

Maladaptive  surface  strategy 

11 

4,493 

.14* 

2.10 

.01 

.264 

166.17*** 

300 

Self-handicapping 

21 

7,319 

.07* 

2.44 

.01 

.13 

109.15*** 

274 

Help-avoidance 

18 

5,694 

|  |  *** 

4.26 

.06 

.16 

55.55*** 

380 

Help-seeking 

20 

4,385 

.04 

1.27 

-.02 

.10 

75.722*** 

140 

Emotions 

Negative  affect 

26 

8,317 

|  ^  *** 

6.01 

.08 

.15 

57.96’* 

549 

Anxiety 

73 

22,825 

| 

9.25 

.11 

.16 

304.00*** 

1,816 

Positive  affect 

22 

7,252 

|  | **** 

4.40 

.06 

.16 

84.92*** 

465 

Enjoyment 

35 

12,063 

1 

5.94 

.10 

.20 

230.70*** 

1,027 

Note.  K  =  number  of  independent  effect  sizes  in  meta-analysis;  N  =  number  of  participants.  All  effect  sizes  (weighted  r)  and  corresponding  z-scores  and 
confidence  intervals  assume  a  random  effects  model.  Z-scores  above  1 .96  and  confidence  intervals  excluding  zero  are  statistical  significant  (p  <  .05).  LB 
and  UP  are  lower  and  upper  bounds  of  the  95%  confidence  interval  (Cl).  Qw  indexes  the  heterogeneity  of  the  effect  size.  Fail-Safe  N  indicates  number  of 
unpublished  studies  needed  to  reduce  overall  effect  size  to  zero. 

*  p  <  .05.  **  p  <  .01.  **><.001. 
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effects  side-by-side  shows  a  highly  mixed  and  even  contradictory 
pattern.  For  example,  PAp  goals  predicted  higher  negative  affect 
and  anxiety,  but  those  effects  are  offset  by  equally  modest  effects 
on  positive  affect  and  enjoyment.  Likewise,  they  predicted  greater 
help-avoidance  and  self-handicapping,  but  also  greater  use  of  deep 
strategies  and  self-regulation. 

Moderator  Analyses 

The  second  phase  of  analyses  tested  the  main  research  question: 
Do  the  PAp  goal’s  effects  sizes  depend  on  how  the  goal  is 
operationally  defined?  Indeed,  as  shown  in  Table  2,  all  PAp  goal 
effects  showed  sizable  and  significant  heterogeneity  (Qw),  indicat¬ 
ing  variability  attributable  to  factors  that  differ  systematically 
between  studies.  Moderator  analyses  tested  whether  the  varied 
thematic  content  of  the  PAp  goal  measure  helps  account  for  this 
heterogeneity. 

This  was  done  by  partitioning  the  sample  of  studies  based  on 
the  majority  scale  code,  and  then  comparing  the  effects  of  each 
PAp  goal  subtype.  We  used  a  weighted  least  squares  metare¬ 
gression  procedure  so  that  we  could  include  sample  age  as  a 
covariate  (Lipsey  &  Wilson,  2001).  The  regression  model  in¬ 
cluded  age  plus  three  dummy  codes  that  compared  the  norma¬ 
tive  goal  against  the  other  subtypes  (i.e.,  appearance,  general, 
and  NoMajority).  The  normative  goal  was  chosen  as  the  key 
comparison  group  because  it  aligns  with  the  goal  standards 
model,  whereas  the  other  PAp  goal  measures  typically  were 
from  studies  anchored  to  the  traditional  goal  orientation  model. 
Significant  positive  effects  for  those  dummy  codes  indicate  that 
the  relationship  between  the  PAp  goal  and  outcome  was  weaker 
for  the  normative  subtype  than  the  comparison  PAp  subtype.  As 
shown  in  Table  1,  26  (54%)  of  the  PAp  goal  measures  were 
normative  and  13  (27%)  were  appearance.  The  general  and 
NoMajority  categories  were  far  less  common;  for  some  out¬ 
comes,  in  fact,  they  included  zero  or  only  one  study  (see  Table 


3).  In  those  cases,  the  regression  model  excluded  the  corre¬ 
sponding  dummy  code(s). 

Table  3  provides  mean  effect  sizes  (r)  and  number  of  studies  ( k ) 
for  each  type  of  PAp  goal.  Table  4  provides  data  for  the  PAp  goal 
type  comparisons,  including  beta  coefficients  ((3),  z-scores  and 
significance  levels,  plus  unstandardized  regression  coefficients  ( B ) 
and  their  corresponding  95%  confidence  intervals  (Cl).  Analyses 
are  summarized  below  separately  for  each  PAp  subtype  compari¬ 
son. 

Normative  versus  Appearance  PAp  goals.  Normative  and 
appearance  PAp  goals  produced  different  patterns  for  several 
outcomes,  all  in  ways  that  indicate  a  more  adaptive  profile  for 
normative  goals.  Specifically,  normative  goals  provided  stron¬ 
ger  benefits  than  appearance  goals  to  self-regulation  (rs  =  .16 
vs.  —.03;  (3  =  —.43),  deep  learning  strategies  (rs  =  .17  vs.  .05; 
(3  =  —.30),  and  adaptive  surface  learning  strategies  (rs  =  .23 
vs.  .07;  (3  =  —.40).  For  each,  the  normative  goal’s  effect  was 
significant  and  positive,  while  the  appearance  goal’s  effect  was 
null.  Both  goals  positively  predicted  overall  competence  per¬ 
ceptions,  but  the  normative  goal  did  so  more  strongly  (rs  .25  vs. 
.13;  3  =  —.25).  Importantly,  this  was  true  whether  the  compe¬ 
tence  perception  measure  emphasized  social  comparison  (rs  .31 
vs.  .15;  3  =  —.33)  or  the  task  only  (rs  .20  vs.  .13;  (3  =  —.18). 
Likewise,  the  appearance  goals  had  stronger  effects  than  nor¬ 
mative  goals  on  two  undesirable  outcomes,  self-handicapping 
(rs  =  .16  vs.  .03;  (3  =  .48)  and  help-avoidance  (rs  =  .16  vs.  .05; 
(3  =  .47.  For  both,  the  appearance  goal’s  effect  was  significant 
and  positive,  while  the  normative  goal’s  effect  was  null. 

Neither  of  these  two  PAp  goals  predicted  help-seeking.  Also, 
although  maladaptive  surface  strategies  seem  more  tied  to  ap¬ 
pearance  goals  than  normative  goals  (rs  =  .42  vs.  .05),  that 
comparison  is  nonsignificant  and  unreliable  because  only  two 
studies  used  the  appearance  goal.  Finally,  the  two  PAp  goals 
also  did  not  differ  in  their  effects  on  any  of  the  positive  or 


Table  3 


Mean  Effect  Sizes  for  All  PAp  Goal  Subtypes 


Outcome 

PAp  goal  subtype 

Normative 

Appearance 

General 

No  majority 

K 

N 

Weighted  r 

K 

N 

Weighted  r 

K 

N 

Weighted  r 

K 

N 

Weighted  r 

Competence  perceptions 

125 

59,877 

47 

15,842 

.13*** 

24 

5,585 

.01 

8 

1,952 

.20** 

Relative  to  task 

73 

24,586 

2q*** 

34 

11,662 

.13*** 

22 

5,264 

-.02 

3 

602 

.14 

Relative  to  peers 

52 

35,291 

.31*** 

13 

4,180 

.15*** 

2 

321 

.35** 

5 

1,350 

.23** 

Study  strategies 

Self-regulation 

23 

10,351 

.16*** 

12 

3,082 

-.03 

3 

541 

.08 

1 

434 

.18 

Deep  strategy 

34 

13,690 

11 

3,153 

.05 

3 

762 

.06 

0 

Adaptive  surface  strategy 

19 

7,076 

.23*** 

6 

636 

.07 

2 

378 

.33*** 

0 

Maladaptive  surface  strategy 

8 

3,482 

.05 

2 

333 

.42 

1 

678 

.43** 

0 

Self-handicapping 

12 

3,414 

.03 

7 

3,557 

.16** 

1 

285 

.14 

1 

63 

-  16 

Help-avoidance 

9 

2,366 

.05 

8 

2,885 

1  ^*** 

0 

1 

443 

17 

Help-seeking 

7 

1,191 

-.01 

11 

2,821 

.04 

2 

373 

.21* 

0 

Emotions 

Negative  affect 

14 

5,917 

.09*** 

10 

1,975 

1 

248 

.15 

1 

177 

27** 

Anxiety 

57 

17,568 

|  yj  *** 

7 

1,583 

.14** 

6 

2,957 

.08 

3 

717 

09 

Positive  affect 

15 

6,111 

.12*** 

6 

893 

.11* 

1 

248 

.01 

0 

Enjoyment 

30 

9,335 

j 

2 

442 

.15 

3 

2,286 

.20*** 

0 

?ote-  *  ~  "umber  °f  ‘ndfPendenI  effeCt  sizes  in  meta_anaNsis:  N  =  number  of  participants.  All  effect  sizes  (weighted  r)  use  a  random  effects  model 

p  <  .1)5.  p  <  .01.  p  <  .001. 
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Table  4 

Moderator  Analyses  of  PAp  Goal  Subtypes 

PAp  goal  comparisons 


95%  Cl 


Outcome 

Qb 

P 

B 

z  score 

LB 

UB 

Competence  perceptions 

Normative  vs.  Appearance 

63.58**’ 

-.25*** 

-.11 

3.94 

-.17 

-.06 

Normative  vs.  General 

-.33*** 

-.20 

5.06 

-.27 

-.12 

Normative  vs.  No  majority 

-.06 

-.06 

1.05 

-.18 

.05 

Sample  age 

_ 2^*** 

-.01 

3.57 

-.02 

-.01 

Relative  to  task 

Normative  vs.  Appearance 

39.89*** 

-.18* 

-.07 

2.20 

-.14 

-.01 

Normative  vs.  General 

-.41*** 

-.20 

4.78 

-.28 

-.11 

Normative  vs.  No  majority 

-.07 

-.07 

0.78 

-.26 

.11 

Sample  age 

-.19* 

-.01 

2.27 

-.01 

.00 

Relative  to  peers 

Normative  vs.  Appearance 

18.81’** 

-.33*’* 

-.16 

3.34 

-.26 

-.07 

Normative  vs.  General 

.07 

.09 

0.72 

-.15 

.33 

Normative  vs.  No  majority 

-.12 

-.09 

1.22 

-.24 

.06 

Sample  age 

-.25** 

-.01 

2.59 

-.02 

-.01 

Study  strategies 

Self-regulation 

Normative  vs.  Appearance 

7.07 

-.43** 

-.19 

2.64 

-.32 

-.05 

Normative  vs.  General 

-.09 

-.07 

0.59 

-.32 

.17 

Normative  vs.  No  majority 

— 

_ 

_ 

_ 

_ 

Sample  age 

-.04 

-.01 

0.38 

-.02 

.01 

Deep  strategy 

Normative  vs.  Appearance 

7.56 

-.30’ 

3 

2.19 

-.24 

-.02 

Normative  vs.  General 

-.14 

-.10 

1.13 

-.27 

.07 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.27* 

-.01 

1.99 

-.02 

-.01 

Adaptive  surface  strategy 
Normative  vs.  Appearance 

4.89 

-.40* 

-.19 

2.07 

-.36 

-.01 

Normative  vs.  General 

.13 

.09 

0.69 

-.17 

.39 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.14 

-.01 

0.74 

-.03 

.02 

Maladaptive  surface  strategy 
Normative  vs.  Appearance 

9.05 

.35 

.24 

0.90 

-.18 

.75 

Normative  vs.  General 

.39 

.28 

1.19 

-.18 

.74 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.23 

-.01 

0.54 

-.05 

.03 

Self-handicapping 

Normative  vs.  Appearance 

4.79 

.48* 

.13 

2.01 

.01 

.25 

Normative  vs.  General 

— 

— 

— 

— 

— 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

.13 

.00 

0.54 

-.01 

.02 

Help-avoidance 

Normative  vs.  Appearance 

5.42 

.47* 

.10 

1.98 

.01 

.20 

Normative  vs.  General 

— 

— 

— 

— 

— 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

.11 

.01 

0.46 

-.01 

.01 

Help-seeking 

Normative  vs.  Appearance 

15.65** 

.07 

.02 

0.39 

-.09 

.14 

Normative  vs.  General 

.43” 

.23 

2.39 

.04 

.42 

Normative  vs.  No  majority 

n/a 

— 

— 

— 

— 

— 

Sample  age 

-.01 

3.32 

-.02 

-.01 

Emotions 

Negative  affect 

Normative  vs.  Appearance 

0.39 

.06 

.01 

0.30 

-.07 

.09 

Normative  vs.  General 

— 

— 

— 

— 

— 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.11 

.00 

0.58 

-.01 

.01 

Anxiety 

Normative  vs.  Appearance 

2.44 

-.02 

-.01 

0.15 

-.10 

.09 

Normative  vs.  General 

-.16 

-.07 

1.36 

-.17 

.03 

(table  continues) 
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Table  4  ( continued ) 


PAp  goal  comparisons 


95%  Cl 


Outcome 

Qb 

P 

B 

z  score 

LB 

UB 

Normative  vs.  No  majority 

-.07 

-.04 

0.56 

-.20 

.11 

Sample  age 

.09 

.00 

0.46 

-.01 

.01 

Positive  affect 

Normative  vs.  Appearance 

14.77*** 

.09 

.03 

0.62 

-.07 

.14 

Normative  vs.  General 

— 

— 

— 

— 

— 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.63*** 

-.02 

3.84 

-.02 

-.01 

Enjoyment 

Normative  vs.  Appearance 

1.25 

.03 

.01 

0.14 

-.21 

.25 

Normative  vs.  General 

.19 

.09 

0.98 

-.09 

.27 

Normative  vs.  No  majority 

— 

— 

— 

— 

— 

Sample  age 

-.17 

-.01 

0.86 

-.02 

.01 

Note.  QB  is  variability  in  the  overall  effect  size  accounted  for  by  differences  among  the  PAp  goal  subtypes.  (3 
and  B  are  standardized  (beta)  and  unstandardized  regression  coefficients,  respectively.  LB  and  UB  refer  to  lower 
and  upper  bounds  of  the  95%  confidence  interval  (Cl).  Significant  negative  coefficients  for  the  goal  comparisons 
indicates  that  the  PAp  goal’s  effect  size  is  stronger  for  the  Normative  type  than  the  comparison  PAp  goal 
subtype.  Dashes  indicate  the  dummy  code  was  omitted  from  the  regression  due  to  insufficient  number  of  studies 
using  the  PAp  type  being  compared  to  Normative  goals. 

>  <  .05.  *><.01.  ”><.001. 


negative  emotion  outcomes,  all  of  which  were  positively  pre¬ 
dicted  by  both  goals. 

Normative  versus  General  PAp  goals.  Few  studies  used 
general  goals  (see  Table  3).  So  for  most  outcomes,  normative 
goals  and  general  goals  did  not  produce  different  patterns.  The 
lone  exceptions  were  that  normative  goals  had  a  positive  and 
stronger  link  than  general  goals  with  overall  competence  per¬ 
ceptions  (rs  =  .25  vs.  .01;  (3  =  —.33) — including  those  ‘purer’ 
ones  measured  solely  in  reference  to  the  task  (rs  =  .20 
vs.  —.02;  (3  =  —.41) — yet  also  a  null  and  weaker  link  with 
help-seeking  (rs  =  —.01  vs.  .21;  p  =  .43),  though  the  latter 
should  be  treated  with  skepticism  because  only  two  help¬ 
seeking  studies  used  general  goals. 

Normative  versus  NoMajority  PAp  goals.  The  No  Majority 
category  captures  a  combination  of  normative,  appearance,  and 
general  goals,  with  none  the  most  salient.  The  broadness  of  this 
category  renders  it  imprecise.  Fortunately,  only  four  measures 
fit  this  category  (see  Table  1),  making  this  category  poorly 
suited  for  moderator  tests.  Indeed  this  type  of  PAp  goal  failed 
to  produce  different  effects  from  normative  goals  on  any  out¬ 
come. 

Age  effects.  The  analyses  above  show  that  normative  and 
appearance  goals  yield  many  different  effects.  We  tested  if  any 
of  these  differences  are  explained  by  an  underlying  age  con¬ 
found.  Across  the  full  sample  of  studies,  sample  age  did  not 
correlate  with  the  %PAp-Normative  goal  measure  (r  =  —.04), 
but  it  did  correlate  with  the  %PAp-Appearance  goal  measure, 
r  =  —.14,  p  <  .05.  Thus,  as  in  Hulleman  et  al.’s  (2010) 
meta-analysis,  the  more  that  the  PAp  goal  measure  emphasized 
competence  demonstration,  the  younger  the  study’s  sample  was 
overall. 

We  therefore  included  sample  age  as  a  covariate  in  the 
moderator  analyses  described  above.  Sample  age  did  moderate 
some  overall  PAp  goal  effects:  the  PAp  goal’s  links  with 
competence  perceptions,  deep  strategies,  help-seeking,  and  pos¬ 


itive  affect  were  stronger  for  younger  samples  (see  Table  4). 
Nevertheless,  those  age  effects  are  separate  from  the  normative 
versus  appearance  PAp  goal  effects,  which  remain  even  when 
controlling  for  sample  age. 

Publication  Bias  Analysis 

This  meta-analysis  included  only  published  articles.  In  gen¬ 
eral,  that  approach  risks  inflating  effect  size  estimates  due  to 
journals  favoring  statistically  significant  findings  (Rosenthal, 
1979).  We  considered  the  risk  on  conceptual,  statistical,  and 
comparative  grounds. 

Conceptually,  the  risk  of  publication  bias  seems  unlikely  here 
because,  of  the  many  reasons  for  rejecting  a  manuscript,  a  null 
PAp  goal  effect  is  unlikely  to  be  one.  After  all,  PAp  goals 
should  produce  mostly  null  or  undesired  effects,  according  to 
goal  theory.  Regardless,  even  if  excluding  unpublished  studies 
does  inflate  these  effect  sizes,  we  consider  this  only  a  minor 
limitation  because,  unlike  in  the  typical  meta-analysis,  estimat¬ 
ing  an  accurate  overall  effect  size  is  not  our  aim.  Rather,  our 
aim  is  to  discern  if  the  inconsistent  PAp  goal  effects  in  pub¬ 
lished  research  are  due  to  differences  in  how  researchers  have 
measured  PAp  goals. 

Nevertheless,  we  statistically  examined  whether  the  overall 
PAp  goal  effect  sizes  could  be  attributable  to  publication  bias.1 * 3 
Because  those  effect  sizes  hover  in  the  ‘small’  range  (r  =  .10, 
°r  d  =  .20;  Cohen,  1992),  if  they  were  substantially  inflated  by 
publication  bias,  then  the  true  pophlation  effect  sizes  must 
verge  on  zero.  Orwin’s  (1983)  fail-safe  formula  allows  a  cal¬ 
culation  of  the  number  of  additional  unpublished  studies  needed 
to  reduce  the  significant  overall  PAp  goal  effect  sizes  to  null 


1  These  analyses  focus  on  the  overall  effect  size  of  PAp  goals  because, 

if  there  is  a  publication  bias,  it  should  apply  equally  to  all  subtypes  of  PAp 

goals. 
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levels  (i.e.,  ^criterion  .01).  As  shown  in  Table  2,  this  value  is 
so  large— ranging  from  274  studies  for  self-handicapping  to 
7,692  studies  for  competence  perceptions — that  it  is  implausi¬ 
ble  that  the  current  effect  sizes  were  heavily  inflated  by  publi¬ 
cation  bias. 

Finally,  it  is  useful  to  compare  the  overall  effect  sizes  with 
those  from  previous  meta-analyses,  each  of  which  included 
unpublished  studies  and  thus  should  be  relatively  free  of  the 
bias  (Baranik  et  al„  2010;  Cellar  et  al„  2011;  Huang,  2011a; 
Payne  et  ah,  2007).  Table  5  provides  this  comparison  for  any  of 
our  outcomes  that  were  tested  in  at  least  one  other  meta¬ 
analysis;  the  list  includes  all  but  surface  learning  strategies, 
self-handicapping,  and  help-avoidance.  Three  things  stand  out. 
First,  the  prior  meta-analyses  are  generally  consistent  with  one 
another,  except  that  Baranik  et  al.  (2010)  found  stronger  overall 
effects  of  PAp  goals  on  competence  perceptions  and  positive 
affect.  Second,  the  present  findings  are  consistent  with  the  prior 
meta-analyses,  too,  in  terms  of  both  the  direction  and  the 
overall  size  of  the  effects.  The  only  notable  exceptions  were 
that  our  effects  for  self-regulation  (r  =  .10)  and  enjoyment  (r  = 
.15)  were  stronger  than  Cellar  et  al.’s  (2011)  and  Huang’s 
(2011a),  respectively.  Third,  for  every  outcome  in  this  compar¬ 
ison,  our  sample  size  of  studies  ( k )  was  two  to  six  times  larger 
than  those  used  in  prior  meta-analyses,  as  should  be  expected 
given  the  timeline  of  publication  dates.  The  last  two  points, 
together,  impart  some  confidence  in  the  reliability  of  the  pres¬ 
ent  findings — and  also  in  those  findings  from  the  prior  meta¬ 
analyses  that  relied  on  a  small  number  of  studies. 

Discussion 

Achievement  goal  theory  has  amassed  copious  research  in  over 
30  years.  In  that  time,  PAp  goals  have  spawned  an  ongoing  debate 
about  their  positive  potential  (Brophy,  2005;  Harackiewicz  et  al., 
1998;  Midgley  et  al.,  2001),  a  debate  anchored  to  their  surprising 
and  unique  mix  of  harmful  and  beneficial  effects  (Senko  et  al., 
2011). 

Our  meta-analysis  further  confirms  those  mixed  effects.  PAp 
goals  predicted  many  educational  outcomes — in  fact,  all  those 
within  our  catalog  except  help-seeking.  Curiously,  as  in  several 
prior  meta-analyses  (Baranik  et  al.,  2010;  Cellar  et  al.,  2011; 
Huang,  2011a;  Payne  et  al.,  2007),  these  effects  appear  contra¬ 
dictory.  PAp  goals  show  an  adaptive  profile  in  some  respects: 


students  pursuing  them  report  higher  competence  perceptions, 
greater  use  of  self-regulation  and  deep  and  adaptive  surface 
learning  strategies,  and  also  more  positive  affect  and  enjoy¬ 
ment,  all  with  small  effect  sizes.  Ye  these  same  goals  also  show 
a  maladaptive  profile:  students  pursuing  them  report  more  self¬ 
handicapping,  help  avoidance,  and  maladaptive  surface  strate¬ 
gies,  as  well  as  more  negative  affect  and  anxiety,  again  all  with 
small  effect  sizes. 

What  explains  this  mixed  pattern  for  PAp  goal  effects?  Several 
possible  moderators  have  been  proposed,  usually  in  the  form  of  a 
“matching  effect.”  For  example,  theorists  posit  that  PAp  goals  can 
be  beneficial  for  adult  students  or  on  simpler  tasks,  but  become 
harmful  for  young  students  or  on  challenging  tasks  (e.g..  Grant  & 
Dweck,  2003;  Midgley  et  al.,  2001).  Neither  idea  has  strong 
evidence,  however.  Consider  the  PAp  goal’s  storied  link  with 
achievement.  Three  meta-analyses  show  that  effect  holds  true 
across  different  ages,  ethnicities,  and  continents  (Huang,  2011b; 
Hulleman  et  al.,  2010;  Van  Yperen,  Blaga,  &  Postmes,  2014); 
another  shows  that  the  effect  becomes  null,  rather  than  harmful, 
with  young  students  (Wirthwein  et  al.,  2013).  The  effect  on 
achievement  has  also  been  found  on  challenging  tasks  (see  Senko 
et  al.,  2013). 

The  present  study  examined  another  possibility  that  has 
emerged  in  recent  years — namely,  that  some  of  those  mixed  find¬ 
ings  for  PAp  goals  trace  to  differences  in  how  researchers  define 
these  goals  (Senko  et  al.,  2011).  The  easiest  way  for  a  meta¬ 
analysis  to  test  this  is  to  compare  prominent  goal  measures,  such 
as  the  Achievement  Goal  Questionnaire  (AGQ;  Elliot  &  Church, 
1997)  and  the  Patterns  of  Adaptive  Learning  Survey  (PALS; 
Midgley  et  al.,  2000),  as  has  been  done  in  prior  meta-analyses  of 
academic  achievement  (Huang,  201  lb;  Hulleman  et  al.,  2010;  Van 
Yperen  et  al.,  2014;  Wirthwein  et  al.,  2013)  or  emotions  (Huang, 
2011a).  But  that  approach  has  two  shortcomings.  One  is  that  it  is 
exclusionary.  It  disallows  tests  of  the  many  other  goal  measures 
custom  made  by  researchers;  that  approach  would  exclude  25%  of 
the  studies  in  our  meta-analysis,  for  example.  The  other  is  that  it  is 
imprecise.  Both  the  AGQ  and  the  PALS,  by  far  the  most  used  goal 
measures,  have  been  revised  multiple  times,  each  iteration  further 
narrowing  the  PAp  goal’s  definition  to  hew  closer  to  the  research¬ 
ers’  guiding  framework.  The  AGQ  originally  included  items  that 
tap  both  normative  and  appearance  goals  (Elliot  &  Church,  1997), 
but  now,  in  line  with  the  goal  standard  model,  it  taps  only  norma- 


Table  5 

Comparison  of  Overall  PAp  Goal  Effect  Sizes  in  Present  and  Prior  Meta-Analyses 


Correlate 

Present  findings 

Payne  et  al. 

(2007) 

Baranik  et  al. 

(2010) 

Huang 

(2011a) 

Cellar  et  al. 
(2011) 

Competence  perceptions 
Self-regulation 

Deep  strategy 

Help-seeking 

Negative  affect 

Anxiety 

Positive  affect 

Enjoyment 

r  =  .19  ( k  =  204) 
r  =  .10  ( k  =  39) 
r  =  .15  ( k  =  48) 
r  =  .04  ( k  =  20) 
r  =  .11  (k  =  26) 
r  =  .13  (k  =  73) 
r  =  .11  ( k  =  22) 
r  =  .15  {k  =  34) 

r  =  .03  ( k  =  44) 

r  =  .13  ( k  =  23) 
r  =  -.01  {k  =  10) 

r  =  .16  (k  =  15) 

r  =  .26  ( k  =  8) 

r  =  .03  ( k  =  9) 
r  =  -.03  (k  =  11) 

r  =  .14  ( k  =  5) 

r  =  .08  ( k  =  8) 
r  =  .12  (k  =  28) 
r  =  .06  ( k  =  7) 
r  =  .04  (k  =  6) 

r  =  .10  (k  =  21) 
r  =  .01  ( k  =  10) 

Note.  Effect  sizes  (r)  are  sample-weighted  correlation  coefficients.  Correlates  in  present  study  unlisted  here  have  not  been  tested  in  prior  meta-analyses. 
k  =  number  of  independent  effect  sizes  in  meta-analysis.  Dashes  mean  the  outcome  was  not  tested  in  that  meta-analysis. 
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tive  goals  (Elliot  &  Murayama,  2008;  Elliot  et  al.,  2011).  The 
PALS  originally  emphasized  normative  and  nongoal  themes 
(Midgley  et  al.,  1998),  but  now,  in  line  with  the  goal  orientation 
model,  all  of  its  items  emphasize  appearance  themes  while  some 
also  tap  normative  themes  (Midgley  et  al.,  2000).  Simply  compar¬ 
ing  these  measures,  without  taking  into  account  their  many  revi¬ 
sions,  cannot  afford  the  precision  needed  to  test  whether  normative 
and  appearance  goals  truly  differ. 

Recognizing  this,  Hulleman  et  al.  (2010)  painstakingly  coded  every 
measure’s  individual  items,  and  then  classified  each  measure  based  on 
the  predominant  theme  running  through  those  items.  Their  results  are 
provocative:  Measures  emphasizing  normative  strivings  positively 
predicted  high  academic  achievement,  whereas  those  emphasizing 
competence  demonstration  negatively  predict  achievement.  But  the 
two  types  of  PAp  goals  had  identical  null  effects  on  interest,  the  other 
outcome  tested  by  Hulleman  et  al.  Given  that  academic  achievement 
may  sometime  be  a  poor  proxy  for  learning  (Soderstrom  &  Bjork, 
2015),  some  goal  theorists  (e.g.,  Brophy,  2005)  speculate  that  norma¬ 
tive  goals  facilitate  only  achievement  and  provide  various  other  costs 
that  mitigate  against  the  gains  in  achievement.  Is  the  benefit  of 
normative  goals — and  any  contrasting  effects  of  normative  versus 
appearance  goals — confined  to  achievement? 

No,  it  is  not.  Using  Hulleman  et  al.’s  (2010)  coding  scheme  on  48 
different  PAp  goal  measures,  we  found  that  their  finding  generalizes 
to  many  other  educational  outcomes.  Normative  goals  predicted  high 
competence  perceptions  and  only  adaptive  strategies  (self-regulation, 
plus  deep  and  adaptive  surface  strategies),  whereas  appearance  goals 
predicted  only  maladaptive  strategies  (self-handicapping  and  help- 
avoidance).  The  differential  patterns  were  not  as  robust  as  Hulleman 
et  al.’s  (2010);  normative  and  appearance  goals  had  opposing  positive 
versus  negative  links  with  achievement  in  their  study,  but  the  two  PAp 
goals  generally  had  null  versus  significant  effects  on  the  outcomes 
studied  here  (see  Table  3).  Still,  a  clear  pattern  emerges  across  both 
meta-analyses:  quite  simply,  normative  goals  seem  more  adaptive 
than  appearance  goals. 

There  were  two  clear  exceptions  to  this  pattern,  however.  One  is 
that  neither  goal  predicted  help-seeking,  though  nor  was  either  ex¬ 
pected  to.  The  other  is  that  normative  and  appearance  goals  share 
similar  relationships  with  emotions;  that,  too,  is  unsurprising,  because 
there  is  tittle  theoretical  rationale  to  expect  these  goals  to  promote 
different  emotions.  What  does  surprise,  however,  is  that  both  PAp 
goals  positively  predicted  positive  emotions  (positive  affect  and  en¬ 
joyment)  and  negative  emotions  (negative  affect  and  anxiety)  alike, 
and  to  the  same  small  degree.  One  possibility  is  that  PAp  goals 
produce  positive  and  negative  emotions  simultaneously.  This  “mixed 
feelings”  explanation,  however,  assumes  that  positive  and  negative 
emotions  correlate  positively  with  one  another,  but  in  fact  they  usually 
correlate  negatively  or  not  at  all  (e.g.,  Pekrun  et  al.,  2006,  2009; 
Watson  et  al.,  1988).  Furthermore,  when  the  two  do  co-occur,  it  is  in 
atypical  situations  (e.g.,  transitions,  or  succeeding  while  witnessing  a 
friend  fail)  and  more  likely  in  East  Asian  cultures  because  of  their 
more  dialectic  patterns  of  thinking  (Miyamoto,  Uchida,  &  Ellsworth, 
2010).  The  mixed  feelings  measured  in  such  studies  are  general 
positive  and  negative  affect  or  happiness-dejection  emotions.  Anxiety 
(a  prospective  outcome-based  emotion)  and  enjoyment  (a  current 
task-based  emotion),  two  specific  emotions  predicted  by  PAp  goals, 
diverge  even  more  and  seem  to  us  less  compatible.  An  alternate 
explanation  is  that,  in  any  study,  some  students  pursuing  PAp  goals 
experience  positive  emotions,  and  others  negative  emotions.  If  true, 


one  wonders  what  predicts  the  direction  their  emotions  take.  We  can 
only  speculate.  One  possibility  is  that  it  depends  on  timing  and 
confidence.  In  the  initial  learning  experience  (e.g.,  early  semester), 
activity-based  emotions  like  enjoyment  (or  boredom)  predominate, 
and  it  may  be  that  PAp  goals  (like  MAp  goals)  promote  mostly 
positive  emotions  then.  But  as  assignments  come  into  focus,  emotions 
become  highly  contingent  on  the  outcome,  especially  for  students 
pursuing  PAp  goals  (Pekrun  et  al.,  2006).  Those  anticipating  success 
will  likely  experience  positive  outcome-based  emotions  tike  hope  and 
pride;  those  less  confident  in  success  will  experience  negative 
outcome-based  emotions  tike  anxiety.  This  remains  for  future  re¬ 
search  to  test  directly. 

The  competence  perception  and  surface  learning  outcomes  each 
merit  special  attention  because  of  how  they  are  measured.  Some 
competence  perception  measures  refer  solely  to  confidence  in  doing  a 
task  well  (e.g.,  Elliot  &  Church,  1997;  Midgley  et  al.,  2001),  whereas 
others  also  encourage  respondents  to  compare  ability  levels  with  peers 
(e.g.,  Marsh,  1990;  Pintrich  &  De  Groot,  1990).  The  two  do  overlap 
a  great  deal  (Bong  &  Skaalvik,  2003),  and  our  findings  show  that  both 
types  correlate  more  strongly  with  normative  goals  than  appearance 
goals.  Still,  the  normative  goal’s  effect  size  is  stronger  when  compe¬ 
tence  perception  measures  emphasize  social  comparison.  Likewise, 
mastery  goals  appear  to  have  stronger  effects  on  self-efficacy  mea¬ 
sures  that  emphasize  learning  or  improvement  rather  than  social 
comparisons  (Senko  &  Hulleman,  2013).  Such  findings  spotlight  the 
potential  for  either  goal’s  effect  to  be  inflated  due  to  overlap  in  content 
between  the  goal  and  outcome  measures. 

Measures  of  the  surface  learning  strategy  are  inconsistent  as  well. 
Some  researchers  (e.g.,  Pintrich,  2004)  define  this  strategy  solely  in 
terms  of  rehearsal  efforts,  and  they  characterize  it  as  adaptive  and 
compatible  with  deep  learning  and  self-regulation  strategies.  Other 
researchers  (e.g.,  Biggs  et  al.,  2001)  define  it  as  rehearsal  done  out  of 
confusion  or  work-avoidance,  and  they  characterize  it  as  maladaptive 
and  incompatible  with  the  other  strategies.  Goal  researchers  have 
historically  ignored  this  distinction,  however,  leading  many  to  inter¬ 
pret  the  PAp  goal’s  oft-cited  link  to  surface  learning  as  a  maladaptive 
effect.  Separating  the  two  measures  in  our  analyses  showed  that 
normative  goals  predicted  only  the  adaptive  type  of  surface  strategy 
(as  well  as  the  deep  learning  and  self-regulation  strategies),  and  that  it 
does  so  much  more  strongly  than  appearance  goals.4 

A  second  purpose  of  this  meta-analysis  was  to  unconfound 
goal  measures  and  sample  age.  In  ours  and  Hulleman  et  al.’s 


4  A  few  other  outcomes  also  varied  in  how  they  were  measured,  but 
those  differences  proved  to  be  irrelevant  in  this  meta-analysis.  First, 
competence  perception  measures  vary  in  another  respect  separate  from  the 
task  versus  social  comparison  distinction.  Some  competence  perception 
measures  emphasize  judgments  of  current  ability,  whereas  others  empha¬ 
size  prospective  expectancies  (e.g.,  self-efficacy).  Second,  some  measures 
of  self-handicapping  capture  task-specific  handicapping  (e.g.,  Urdan  & 
Midgley,  2001),  whereas  others  capture  more  general  tendencies  to  hand¬ 
icap  (e.g.,  Rhodewalt,  1990),  though  both  dypes  were  treated  as  goal 
outcomes  in  virtually  all  studies  in  the  meta-analysis.  Third,  some  mea¬ 
sures  of  anxiety  emphasize  state-like  feelings  that  are  modeled  as  goal 
outcomes,  whereas  others  emphasize  trait-like  feelings  that  are  modeled  it 
as  a  goal  antecedent  (Spielberger,  1989).  The  same  is  also  possible  for 
negative  and  positive  affect  (Watson,  Clark,  &  Tellegen,  1988),  but  nearly 
all  studies  in  the  meta-analyses  treated  those  measures  as  goal  outcomes. 
Supplemental  analyses  showed  that  the  key  findings  (i.e.,  normative  vs. 
appearance  goal  comparisons)  reported  in  this  paper  held  true  for  either 
version  of  each  of  these  outcome  measures. 
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(2010)  meta-analyses,  appearance  goals,  but  not  normative 
goals,  were  more  likely  to  be  used  in  research  on  younger 
students.  It  is  plausible,  therefore,  that  any  maladaptive  effects 
found  for  appearance  goals  could  really  be  attributable  to 
younger  students  being  less  comfortable  with  any  type  of  PAp 
goal.  The  present  study  used  a  metaregression  technique  that 
tested  the  independent  effects  of  PAp  goal  subtypes  versus 
sample  age.  Age  did  moderate  some  of  the  overall  PAp  goal 
effects;  the  goal  s  links  with  competence  perceptions,  deep 
learning  strategies  help-seeking,  and  positive  affect  were  all 
more  positive  for  younger  samples.  Yet  age  did  not  diminish  or 
explain  any  of  the  significantly  different  effects  of  normative 
versus  appearance  goals. 

This  meta-analysis  included  two  other  PAp  goal  categories  as 
well:  General  goals  and  NoMajority  goals.  General  goals  were 
a  catch-all  category  for  items  not  emphasizing  normative  com¬ 
parisons  or  competence  demonstration.  They  instead  empha¬ 
sized  either  avoiding  challenge  or,  in  most  cases,  attaining 
desired  outcomes  (e.g.,  wanting  to  do  well  or,  in  academic 
contexts,  to  get  a  good  grade).  Those  measures  (e.g.,  Pintrich  et 
al.,  1993),  created  early  in  goal  theory’s  development,  were 
often  referred  to  as  outcome  (or  extrinsic)  goals,  a  label  that 
reflects  the  initially  blurred  lines  between  achievement  goal 
theory  and  other  theories  (e.g.,  Deci  &  Ryan,  1985;  Zimmerman 
&  Kitsantas,  1999)  that  contrast  adaptive  (mastery  goals,  in¬ 
trinsic  motivation,  process  focus)  versus  maladaptive  (perfor¬ 
mance  goals,  extrinsic  motivation,  outcome  focus)  motivations. 
Outcome  goals  are  certainly  common,  perhaps  much  more  so 
than  MAp  or  PAp  goals.  But  their  conceptual  value  is  less  clear. 
They  focus  on  the  rewards  of  achievement  more  than  attaining 
competence  itself.  In  principle,  these  broader  outcome  goals 
could  be  achieved  through  task  mastery  or  by  outperforming 
others,  and  indeed  some  studies  show  that  outcome  goals  offer 
little  unique  predictive  value  beyond  those  two  traditional 
achievement  goals  (e.g.,  Grant  &  Dweck,  2003).5  Most  achieve¬ 
ment  goal  theorists  (e.g.,  Elliot  &  Thrash,  2001;  Midgley  et  al., 
2000)  therefore  abandoned  outcome  goals  and  defined  PAp 
goals  solely  with  appearance  or  normative  themes.  Conse¬ 
quently,  there  were  too  few  studies  using  general  PAp  goal 
measures  to  include  in  most  of  our  moderator  analyses.  In  fact, 
only  two  educational  outcomes  included  more  than  five  studies 
with  general  goals,  and  the  one  reliable  effect  is  that  the  general 
goals  were  significantly  less  likely  than  normative  goals  to 
predict  competence  perceptions.  The  final  PAp  goal  subtype, 
the  NoMajority  goal  category,  applied  to  measures  in  which  the 
items  sampled  too  broad  a  range  of  themes  for  any  one  of  them 
(e.g.,  normative  comparison)  to  predominate.  There  were  only  a 
few  such  measures  (e.g.,  Greene  et  al.,  2004)  and  too  few 
studies  using  them  to  allow  meaningful  tests  of  its  unique 
impact  on  any  outcomes. 

Limitations 

This  meta-analysis  is  limited  in  several  respects.  Two  concern 
statistical  power.  First,  because  so  few  studies  use  appearance 
goals  in  tests  of  maladaptive  surface  strategies  or  enjoyment,  we 
recommend  caution  in  interpreting  their  effect  sizes  for  that  goal. 
Fortunately,  those  two  tests  seem  to  be  the  only  underpowered 
ones  among  the  main  analyses;  the  number  of  studies  in  each  of 


our  overall  analyses  far  surpass  prior  meta-analyses  (see  Table  5), 
and  the  remaining  outcomes  had  sufficient  number  of  studies  to 
permit  reliable  comparisons  of  appearance  and  normative  goals 
(see  Table  3).  Second,  this  meta-analysis  excluded  some  important 
outcomes  because  there  were  too  few  studies  overall,  or  else 
because  virtually  all  studies  used  only  the  normative  PAp  goal.  So 
we  cannot  know  whether  the  patterns  shown  here  extend  to,  for 
example,  interpersonal  competencies  (e.g.,  collaboration,  belong¬ 
ingness),  moral  judgments  (e.g.,  cheating),  or  student  well-being 
(aside  from  emotions).  For  the  same  reason,  we  also  excluded  all 
established  goal  antecedents — whether  features  of  the  student 
(e.g.,  beliefs  about  ability  being  fixed,  perfectionism),  the  task 
(e.g.,  novelty,  difficulty),  the  learning  context  (e.g.,  student- 
teacher  relationships,  or  ‘goal  structures’  cultivated  through  in¬ 
structional  and  evaluation  methods),  or  the  broader  culture  (see 
Senko,  2016,  for  a  review  of  antecedents).  Such  work  will  be  a 
fruitful  direction  for  future  studies. 

Finally,  as  with  the  research  it  compiled,  this  meta-analysis 
cannot  allow  causal  conclusions  about  the  relationships  between 
goals  and  correlates.  Are  the  correlates  goal  outcomes?  Or  are  they 
goal  antecedents?  They  all  certainly  fit  well  as  outcomes  concep¬ 
tually;  virtually  every  included  study  treated  them  that  way,  in  fact, 
with  the  exception  of  anxiety  sometimes  being  a  dispositional 
antecedent  (see  footnote  4).  Moreover,  we  know  from  many  lab¬ 
oratory  studies  that  experimentally  induced  goals  can  directly 
impact  a  variety  of  student  outcomes,  such  as  interest  and  task 
performance  (for  meta-analyses,  see  Rawsthome  &  Elliot,  1999, 
and  Van  Yperen  et  al.,  2015).  Clearly,  goals  can  provide  causal 
effects.  But  those  goals  are  also  dynamic  and  responsive  to  ongo¬ 
ing  experience.  It  is  probably  best,  therefore,  to  view  many  goal- 
outcome  links  as  reciprocal  over  time  (Harackiewicz  et  al.,  2002; 
King  &  Mclnemey,  2016;  Linnenbrink  &  Pintrich,  2002;  Putwain, 
Larkin,  &  Sander,  2013;  Senko  &  Harackiewicz,  2005;  Van 
Yperen  &  Renkema,  2008).  For  example,  the  link  between  nor¬ 
mative  goals  and  academic  achievement  is  bidirectional:  pursuing 
the  goal  facilitates  achievement  attributable  to  various  processes 
aroused  by  the  goal,  and  achievement  reinforces  continual  pursuit 
of  the  goal  (Van  Yperen  &  Renkema,  2008).  Similar  patterns  may 
be  true  of  the  normative  goal’s  links  with  competence  perceptions, 
self-regulation,  and  deep  and  surface  learning  strategies.  Likewise, 
appearance  goals  may  lead  students  to  avoid  help  or  to  self¬ 
handicap,  and  the  success  of  these  strategies  at  minimizing  nega¬ 
tive  evaluations  from  teachers  or  peers  may  reinforce  the  contin¬ 
ued  pursuit  of  that  goal.  Reciprocal  patterns  should  also  apply  to 
PAv,  MAp,  and  MAv  goals,  of  course. 

Theoretical  Implications 

Now  that  it  is  clear  the  two  PAp  goals  differ  not  just  concep¬ 
tually  but  also  empirically,  the  field  must  decide  how  best  to 


5  In  fact,  on  the  rare  occasion  that  outcome  goals  are  tested  nowadays, 
they  are  classified  by  some  goal  orientation  theorists  as  a  type  of  perfor¬ 
mance  goal  (e.g.,  Bong,  Woo,  &  Shin,  2013;  Sideridis,  Kaplan,  Papado- 
polous,  &  Anastasiadis,  2014)  and  by  some  goal  standards  theorists  as  a 
type  of  mastery  goal  (Tuominen-Soini,  Salmela-Aro,  &  Niemivirta,  2012). 
In  both  cases,  the  outcome  goal  is  defined  by  what  it  is  not  -  as  a 
performance  goal  because  it  lacks  the  mastery  goal’s  focus  on  learning 
within  the  goal  orientation  model,  or  as  a  mastery  goal  because  it  lacks  the 
performance  goal’s  focus  on  competition  within  the  goal  standard  model. 
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proceed.  One  option  is  to  choose  one  goal  (and  its  corresponding 
achievement  goal  model)  and  abandon  the  other.  But  this  is  divi¬ 
sive  and  implausible;  after  decades  of  research,  both  goals  are  too 
entrenched  to  abandon.  Nor  is  it  clear  that  either  the  goal  orienta¬ 
tion  model  or  the  goal  standard  model  is  superior  to  the  other, 
whether  on  theoretical  grounds  or  empirical  grounds  (Senko, 
2016).  The  field  instead  needs  a  progressive  approach,  one  that 
incorporates  appearance  and  normative  goals.  There  are  two  ways 
to  do  this. 

Comparing  both  PAp  goals.  One  way  is  to  compare  the  two 
types  of  PAp  goals  side-by-side.  Recent  studies  have  begun  to  do 
this  (Grant  &  Dweck,  2003;  Edwards,  2014;  Hackel,  Jones,  Car- 
nonneau,  &  Mueller,  2016;  Senko  &  Tropiano,  2015;  Warburton  & 
Spray,  2014).  Each  confirms  that  normative  and  appearance  goals 
separate  in  factor  analyses  and,  more  importantly,  that  they  pro¬ 
duce  different  effects  that  match  the  meta-analytic  findings.  In 
samples  of  high  school  or  college  students,  normative  goals  pro¬ 
moted  self-efficacy  (Edwards,  2014;  Senko  &  Tropiano,  2015)  and 
achievement  (Warburton  &  Spray,  2014),  whereas  appearance 
goals  promoted  ability  attributions  for  failure  (Grant  &  Dweck, 
2003),  effort  withdrawal  and  self-handicapping  (Grant  &  Dweck, 
2003;  Senko  &  Tropiano,  2015),  help-avoidance  (Senko  &  Tro¬ 
piano,  2015),  and  disinterest  (Edwards,  2014).  Additional  studies 
are  needed  to  generalize  this  pattern  to  other  cultures  and  learning 
contexts. 

Moving  forward,  we  also  need  to  learn  when  the  two  PAp 
goals  diverge  or  overlap,  and  why.  We  suspect  that  normative 
and  appearance  goals  arouse  somewhat  different  processes,  and 
that  they  therefore  will  diverge  most  for  outcomes  that  are 
sensitive  to  those  processes.  Consider  first  appearance  goals. 
Striving  to  appear  talented  should,  according  to  goal  orientation 
theorists  (Maehr,  1984;  Nicholls,  1984),  arouse  public  self- 
consciousness  and  worrisome  thoughts  that  disrupt  task  focus 
and  impair  learning  and  achievement,  especially  when  doing 
challenging  tasks.  Appearance  goals  should  predict  outcomes 
rooted  to  these  processes — for  example,  self-handicapping  and 
help-avoidance,  as  well  as  socially  influenced  emotions  such  as 
anxiety  during  task  engagement,  shame  following  failure,  and 
relief  following  success.  Normative  goals  do  not  share  this  fate. 
They  need  not  arouse  self-consciousness  or  impair  task  focus. 
Instead,  owing  to  their  normative  standard  for  goal  attainment, 
they  have  two  unique  qualities.  First,  they  are  usually  challeng¬ 
ing — more  so  than  mastery  goals,  in  fact  (Senko  &  Hulleman, 
2013) — because  attaining  them  requires  outperforming  most  or 
all  peers.  Second,  because  these  standards  are  typically  set  by 
others  (e.g.,  teachers),  they  are  also  rigid  and  outside  the 
student’s  control.  These  two  qualities  may  deter  unconfident 
students  (Van  Yperen  &  Renkema,  2008)  and  arouse  moderate 
anxiety  in  students  chasing  these  goals,  but  they  also  can 
promote  engagement  (Lee  et  al.,  2003)  and  effort  (Elliot  et  ah, 
1999),  attune  students  to  task  demands  (Senko,  Hama,  &  Bel¬ 
monte,  2013),  and  proffer  positive  outcome-based  emotions 
such  as  hope  or  pride  (Pekrun  et  ah,  2006).  Studies  are  needed 
to  explore  these  possibilities. 

This  basic  principle  should  also  apply  to  the  antecedents  of 
each  PAp  goal.  Perhaps  appearance  and  normative  goals  corre¬ 
late  more  strongly,  and  produce  similarly  undesirable  effects, 
when  sharing  a  potent  antecedent.  For  example,  the  two  might 
converge  more  in  heavily  evaluative  contexts.  Many  students 


are  likely  to  pursue  appearance  goals  in  such  conditions,  and 
those  also  pursuing  normative  goals  probably  do  so  because 
they  believe  that  outperforming  peers  will  impress  evaluators. 
The  two  goals  work  in  concert  in  those  cases — exactly  as  the 
original  goal  orientation  model  proposed  (Ames,  1992;  Dweck, 
1986;  Nicholls,  1984).  The  added  challenge  of  the  normative 
standard  (i.e.,  having  to  outperform  most  others)  might  even 
amplify  the  inimical  processes  aroused  by  appearance  goals. 
Recent  experiments  by  Sideridis  et  al.  (2014)  offer  tentative 
evidence.  Participants  completed  a  novel  task  in  a  highly  eval¬ 
uative  context,  and  those  given  a  very  difficult  normative  goal 
(i.e.,  to  outperform  everyone  else  who  has  done  the  study) 
suffered  more  anxiety  than  those  not  given  the  normative  goal. 
Perhaps  the  same  normative  goal,  however,  would  produce 
more  beneficial  effects  if  pursued  in  less  evaluatively  threaten¬ 
ing  contexts — and  therefore  without  the  accompaniment  of  an 
appearance  goal.  Senko  and  Harackiewicz  (2005)  found  prelim¬ 
inary  evidence  for  this.  Their  participants  pursuing  normative 
goals  were  more  engaged  and  interested  when  doing  the  task 
within  a  relatively  neutral  context  instead  of  an  evaluatively 
threatening  one.  Those  two  papers  suggest  that  some  contexts 
summon  appearance  and  normative  goals  together,  whereas 
others  allow  students  more  freedom  to  pursue  normative  goals 
without  appearance  goals.  Similar  patterns  might  emerge  for 
other  goal  antecedents,  even  dispositional  ones.  For  example, 
perhaps  the  two  goals  overlap  more  among  test-anxious  stu¬ 
dents  than  other  students. 

Comparing  the  two  goals  has  obvious  appeal.  First,  it  is 
simple  methodologically.  Second,  it  sharpens  the  field’s  focus 
on  processes  elicited  by  PAp  goals.  And  third,  it  is  conciliatory, 
adopting  a  ‘big  tent’  approach  rather  than  insisting  the  field 
continue  with  only  one  of  these  two  PAp  goals.  Notwithstand¬ 
ing  these  positives,  however,  we  believe  this  approach’s  long¬ 
term  potential  is  limited  in  one  crucial  way:  it  fails  to  truly 
integrate  the  goal  orientation  and  goal  standard  frameworks  that 
created  the  two  different  PAp  goals. 

Integrating  both  PAp  goals.  The  better  long-term  option,  in 
our  opinion,  would  integrate  rather  than  compare  the  two  PAp 
goals.  There  may  be  several  ways  to  do  so  (e.g.,  Korn  &  Elliot, 
2016).  The  leading  contender  at  this  juncture  is  the  goal  com¬ 
plex  model  (Elliot  &  Thrash,  2001;  Urdan,  2000;  Vansteenkiste 
et  al.,  2014).  Essentially  a  compromise  between  the  goal  ori¬ 
entation  and  goal  standard  models,  this  model  assumes  that  goal 
standards  (interpersonal  for  PAp  goals,  intrapersonal  for  MAp 
goals)  are  the  ‘true’  achievement  goal,  but  it  also  assumes  that 
these  goals  can  be  pursued  for  a  variety  of  reasons  and  that 
those  reasons  help  shape  the  achievement  goal’s  effects.  Put 
another  way,  a  goal  complex  is  a  hierarchy  of  two  goals:  a 
higher-order  one  focused  on  the  broad  purpose  or  reason  for 
engaging  in  a  task,  in  line  with  the  goal  orientation  model,  plus 
a  slightly  lower-order  goal  focused  on  the  standard  for  defining 
success  when  engaging  the  task,  in  line  with  the  goal  standard 
model.  According  to  this  model,  a  PAp  goal  standard  (outper¬ 
forming  others)  serves  the  higher-order  reason,  the  two  together 
creating  a  ‘goal  complex’  that  guides  student  experience.  Thus, 
a  normative  goal  can  be  pursued  to  appear  talented,  and  it  is 
likely  in  such  cases  to  have  the  maladaptive  effects  long  theo¬ 
rized  for  PAp  goals.  But  the  same  normative  goal  can  also  be 
pursued  for  others  reasons,  such  as  priae  or  enjoyment  of 
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competing,  and  its  effects  in  those  cases  may  be  more  adaptive. 
Several  recent  studies  of  goal  complexes  support  this  premise 
(e.g.,  Dompnier,  Darnon,  &  Butera,  2013;  Gillet,  Lafreniere, 
Vallerand,  Huart,  &  Fouquereau,  2014;  Michou,  Vansteenkiste, 
Mouratidis,  &  Lens,  2014;  Senko  &  Tropiano,  2015;  Vansteen¬ 
kiste,  Mouratidids,  &  Lens,  2010).  Indeed,  it  seems  typical  for 
students  who  pursue  normative  goals  to  do  so  for  healthier 
reasons  than  a  desire  to  appear  talented  (Michou  et  al„  2014; 
Senko  &  Tropiano,  2015;  Urdan  &  Mestas,  2006).  This  might 
explain  why  normative  goals  generally  are  more  beneficial  than 
appearance  goals  on  the  whole.  New  research  is  needed  to 
further  probe  this  possibility,  to  chart  the  full  range  of  PAp  goal 
complexes,  and  to  identify  the  triggers  of  each.  In  fact,  all  of  the 
research  directions  suggested  above  for  comparing  normative 
and  appearance  goals  can  also  be  accomplished  with  the  goal 
complex,  model. 

Conclusion 

It  is  now  clear  that  normative  goals  and  appearance  PAp 
goals  often  differ,  the  former  behaving  more  adaptively  overall 
than  the  latter.  Looking  forward,  the  field  must  determine 
whether  to  study  the  two  goals  together  or  to  integrate  them 
somehow,  as  with  a  goal  complex  model.  In  the  meantime,  we 
encourage  researchers  to  use  meaningful  goal  labels  that  specify 
the  type  of  PAp  goal  used  (e.g.,  “normative”  or  “appearance”) 
and  also  their  overarching  achievement  goal  model  (i.e.,  goal 
standards  or  goal  orientations).  This  practice  will  ensure  greater 
alignment  between  a  study’s  methods  and  guiding  theoretical 
model.  It  will  also  allow  readers  to  more  easily  distinguish  the 
two  PAp  goals,  compartmentalize  their  respective  findings,  and 
generate  clearer  new  directions. 
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Appendix 

Search  Terms  Used  for  Each  Outcome,  All  in  Conjunction  With  “Goal  Orientation*”  or  “Achievement  Goal  ” 


Competence  perceptions: 
Study  strategies: 


Self-regulation 
Self-handicapping: 
Help-avoidance  &  help-seeking: 
Emotions: 


self-efficacy,  perceived  competenc*,  competence  expectanc*,  academic  self-concept, 
academic  expect* 

learning  strateg*,  superficial  learn*,  superficial  process*,  superficial  approach, 
superficial  strateg*,  surface  learn*,  surface  process*,  surface  approach,  surface 
strateg*,  shallow  learn*,  shallow  process*,  shallow  approach,  shallow  strateg*, 
memoriz*,  rehears*,  deep  learn*,  deep  process*,  deep  approach,  deep  strateg*, 
elaborat*,  critical  think* 
self-regulat*,  metacognit* 
self-handicap*,  impression  manage* 

help-avoid*,  help  threat*,  help-seek*,  feedback  seek*,  feedback  accept* 
emotion,  negative  affect*,  anxiety,  fear,  worry,  positive  affect*,  enjoyment 


Note.  To  expand  the  search,  “*”  endings  allow  any  variation  of  ending  to  the  root  term. 
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